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Abstract 

An  integral  part  of  modeling  the  global  view  of  network  security  is  constructing  attack  graphs.  In  practice, 
attack  graphs  are  produced  manually  by  Red  Teams.  Construction  by  hand,  however,  is  tedious,  error- 
prone,  and  impractical  for  attack  graphs  larger  than  a  hundred  nodes.  In  this  paper  we  present  an 
automated  technique  for  generating  and  analyzing  attack  graphs.  We  base  our  technique  on  symbolic 
model  checking  [4]  algorithms,  letting  us  construct  attack  graphs  automatically  and  efficiently.  We  also 
describe  two  analyses  to  help  decide  which  attacks  would  be  most  cost  effective  to  guard  against.  We 
implemented  our  technique  in  a  tool  suite  and  tested  it  on  a  small  network  example,  which  includes  models 
of  a  firewall  and  an  intrusion  detection  system. 

1.  Overview 

As  networks  of  hosts  continue  to  grow  in  size  and  complexity,  evaluating  their  vulnerability  to 
attack  becomes  increasingly  more  important  to  automate.  There  are  several  tools,  such  as  COPS  [10]  and 
Renaud  Deraison’s  Nessus  Security  Scanner  [9],  that  report  vulnerabilities  of  individual  hosts.  To  evaluate 
the  vulnerability  of  a  network  of  hosts,  however,  we  also  have  to  analyze  the  effects  of  interactions  of  local 
vulnerabilities  and  find  global  vulnerabilities  introduced  by  the  interconnections  between  hosts.  A  typical 
process  for  vulnerability  analysis  of  a  network  proceeds  as  follows.  First,  we  determine  vulnerabilities  of 
individual  hosts  using  scanning  tools,  such  as  COPS  and  Nessus  Scanner.  Using  this  local  vulnerability 
information  along  with  other  information  about  the  network,  such  as  connectivity  between  hosts,  we  then 
produce  attack  graphs.  Each  path  in  an  attack  graph  is  a  series  of  exploits,  which  we  call  atomic  attacks, 
that  leads  to  an  undesirable  state,  e.g.,  a  state  where  an  intruder  has  obtained  administrative  access  to  a 
critical  host.  We  can  then  perform  further  analyses,  such  as  risk  analysis  [21],  reliability  analysis  [13],  or 
shortest  path  analysis  [23],  on  the  attack  graph  to  assess  the  overall  vulnerability  of  the  network. 

Constructing  attack  graphs  is  a  crucial  part  of  doing  vulnerability  analysis  of  a  network  of  hosts. 
Construction  by  hand,  however,  is  tedious,  error-prone,  and  impractical  for  attack  graphs  larger  than  a 
hundred  nodes.  Automating  the  process  of  constructing  attack  graphs  also  ensures  that  the  attack  graphs  are 
exhaustive  and  succinct.  An  attack  graph  is  exhaustive  if  it  covers  all  possible  attacks,  and  succinct  if  it 
contains  only  those  network  states  from  which  the  intruder  can  reach  his  goal.  We  follow  these  steps  to 
produce  and  analyze  attack  graphs: 

1 .  Model  the  network. 

We  model  the  network  as  a  finite  state  machine,  where  state  transitions  correspond  to  atomic 
attacks  launched  by  the  intruder.  We  also  specify  a  desired  security  property  (e.g.,  an  intruder  should 
never  obtain  root  access  to  host  A).  The  intruder’s  goal  generally  corresponds  to  violating  this 

property. 


2.  Produce  an  attack  graph. 

Using  the  model  from  Step  1,  our  modified  version  of  the  model  checker  NuSMV  [16] 

automatically  produces  the  attack  graph.  The  graphs  are  rendered  using  the  GraphViz  visualization 
package  [1]. 
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Figure  1.  Tool  Suite 


3.  Aualysis  of  attack  graphs. 

A  raw  attack  graph  is  a  low-level  state  transition  diagram.  To  allow  the  domain  specialist  to 
analyze  it  in  a  meaningful  way,  we  parse  the  graph  and  reconstruct  the  original  meanings  of  the  state 

variables  as  they  relate  to  the  network  intrusion  domain.  In  Section  4  we  discuss  two  different 
analyses  on  attack  graphs  that  quantify  the  likelihood  of  intruder  success. 

Figure  1  shows  the  architecture  of  our  tool  suite.  We  do  not  require  or  expect  users  of  our  tool 
suite  to  have  model  checking  expertise.  Instead  of  using  the  input  language  of  the  NuSMV  model  checker, 
a  user  may  describe  the  network  model  and  desired  property  in  XML  [5].  We  built  a  special-purpose 
compiler  that  takes  an  XML  description  and  translates  it  into  the  input  language  of  NuSMV. 

In  the  field  of  model  checking,  the  use  of  fundamental  data  structures,  such  as  Binary  Decision 
Diagrams  (BDDs)  [2],  enabled  significant  advances  in  the  size  of  the  systems  that  can  be  analyzed  [3,  4]. 
More  recently,  model  checking  researchers  have  developed  a  variety  of  reduction  and  abstraction 
techniques  to  handle  even  larger,  possibly  infinite  state  spaces.  Since  our  techniques  build  upon  the 
underlying  representation  and  algorithms  used  in  model  checking,  we  are  able  to  leverage  the  recent 
success  in  that  field.  As  model  checkers  handle  larger  state  spaces,  our  analysis  can  be  applied  to  larger 
networks. 

Our  paper  reports  on  the  following  contributions  to  analyzing  vulnerabilities  in  networks: 

•  We  exhibit  an  algorithm  for  automatic  generation  of  attack  graphs.  The  algorithm  generates 
exhaustive  and  succinct  attack  graphs.  We  provide  a  tool,  as  a  part  of  a  larger  tool  suite,  which 
implements  the  algorithm. 

•  Through  a  small  case  study,  we  identify  a  level  of  atomicity  appropriate  for  describing  a  model  of 
the  network  and  an  intruder’s  arsenal  of  atomic  attacks.  The  model  is  abstract  enough  to  be 
understood  by  security  domain  experts,  yet  simple  enough  for  our  tool  to  analyze  efficiently. 

•  Our  network  model  includes  intrusion  detection  components  and  distinguishes  between  stealthy 
and  detectable  attack  variants.  We  are  able  to  generate  “stealthy”  attack  subgraphs  (i.e.  subgraphs 
with  attacks  that  are  not  detected  by  the  intrusion  detection  components).  Analysis  of  stealthy 
attack  subgraphs  reveals  the  best  locations  for  placing  additional  intrusion  detection  components. 

•  We  describe  two  ways  of  analyzing  attack  graphs:  an  algorithm  for  determining  a  minimal  set  of 
atomic  attacks  whose  prevention  would  guarantee  that  the  intruder  will  fail,  and  a  probabilistic 
reliability  analysis  that  determines  the  likelihood  that  the  intruder  will  succeed. 

Paper  organization:  We  give  a  detailed  description  of  our  attack  graph  generation  algorithm  in 

Section  2.  We  describe  an  intrusion  detection  case  study  in  Section  3  and  results  of  attack  graph  analysis  in 
Section  4.  We  discuss  related  work  in  Section  5  and  close  with  suggestions  for  future  work  in  Section  6. 

2.  Attack  Graphs 

First,  we  define  formally  attack  graphs,  the  data  structure  used  to  represent  all  possible  attacks  on 
a  network. 

Definition  1  An  attack  graph  or  AG  is  a  tuple  G  =  (5,  r.  So,  Vs), where  V  is  a  set  of  states,  r  C  V  v  V  is  a 
transition  relation,  Vo  C  V  is  a  set  of  initial  states,  and  V^  C  V  is  a  set  of  success  states. 
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Intuitively,  5,  denotes  the  set  of  states  where  the  intruder  has  aehieved  his  goals.  Unless  stated 
otherwise,  we  assume  that  the  transition  relation  x  is  total.  We  define  an  execution  fragment  as  a  finite 

sequenee  of  states  SoS,...S„  sueh  that  (5'„  5',+;)  €  r  for  all  0  <i  <n.  An  exeeution  fragment  with  So  €  So  is  an 

execution,  and  an  exeeution  whose  final  state  is  in  So  is  an  attack,  i.e.,  the  exeeution  eorresponds  to  a 
sequenee  of  atomie  attaeks  leading  to  the  intruder’s  goal  state. 

2.1.  Constructing  Attack  Graphs 

Model  eheeking  is  a  teehnique  for  eheeking  whether  a  formal  model  M  of  a  system  satisfies  a 
given  property  p.  If  the  property  is  false  in  the  model,  model  eheekers  typieally  output  a  eounter-example, 
or  a  sequenee  of  transitions  ending  with  a  violation  of  the  property. 

In  the  model  eheeker  NuSMV,  the  model  M  is  a  finite  labeled  transition  system  and  p  is  a  property 
written  in  Computation  Tree  Logic  (CTL).  In  this  paper,  we  eonsider  only  safety  properties,  whieh  in  CTL 
have  the  form  AGf  (i.e.,  p  =  AGf,  where  /  is  a  formula  in  propositional  logie).  If  the  model  M  satisfies  the 
property  p,  NuSMV  reports  “true.”  If  M  does  not  satisfy  p,  NuSMV  produees  a  eounterexample.  In  our 
eontext  M  is  a  model  of  the  network  and  p  is  a  safety  property.  A  single  eounter-example  shows  an  attaek 
that  leads  to  a  violation  of  the  safety  property. 

Attaek  graphs  depiet  ways  in  whieh  an  intruder  ean  foree  a  network  into  an  unsafe  state.  We  ean 
express  the  property  that  an  unsafe  state  eannot  be  reaehed  as: 

AG(^unsafe) 

When  this  property  is  false,  there  are  unsafe  states  that  are  reaehable  from  the  initial  state.  The  preeise 
meaning  of  unsafe  depends  on  the  network.  For  example,  the  property  given  below  might  be  used  to  say 
that  the  privilege  level  of  the  adversary  on  the  host  with  index  2  should  always  be  less  than  the  root 
(administrative)  privilege. 


Input: 

S  -  set  of  states 
R  CS  X  S  -  transition  relation 
SoCS-  set  of  initial  states 

L  :  A  — >  2'“'  -  labeling  of  states  with  propositional  formulas 
P  =  AG(^unsafe)(a  safety  property) 

Output: 

attaek  graph  Gp  =  (Sunsafe,  o,  ^s) 

Algorithm:  Generate  Attack  Graph  (S,  R,  So,  L,  p) 

Use  model  eheeking  to  find  the  set  of  states  A„„5a/ethat  violate  the  safety  property 
AG(^unsafe).  *) 

Sunsafe  =  modelCheck(S,  R,  So,  L,  p) 

(*  Restriet  the  transition  relation  R  to  states  in  the  set  Sunsafe  *) 

R  R  n  (Sunsafe^  Sunsafe)- 
Sko^  So  n  Sunsafe 

S’,  =  {S\  sC  Sunsafe  unsafe  L  (s)f 

Return(Sunsafe,  R^,  S^o,  Sf). 


Figure  2.  Algorithm  for  Generating  Attack  Graphs 
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AG{network.adversary.privilege  [2]  <  network.priv.root) 

We  briefly  deseribe  the  algorithm  for  eonstrueting  attaek  graphs  for  the  property  AG(^unsafe). 
The  first  step  is  to  determine  the  set  of  states  St  that  are  reaehable  from  the  initial  state.  Next,  the  algorithm 
eomputes  the  set  of  reaehable  states  Sunsafe  that  have  a  path  to  an  unsafe  state.  The  set  of  states  Sunsafe  is 
eomputed  using  an  iterative  algorithm  derived  from  a  fix-point  eharaeterization  of  the  AG  operator  [4].  Let 
R  be  the  transition  relation  of  the  model,  i.e.,  (s,  s')  €  R  \i  and  only  if  there  is  a  transition  from  state  ^  to  s'. 
By  restrieting  the  domain  and  range  of  R  to  Sunsafe  we  obtain  a  transition  relation  Rp  that  eneapsulates  the 
edges  of  the  attaek  graph.  Therefore,  the  attaek  graph  is  (Sunsafe,  R’’,  Sf,  Sf),  where  Sunsafe  and  R’’  represent 
the  set  of  nodes  and  set  of  edges  of  the  graph,  respeetively;  Sf  =  5'(,  fl  Sunsafe  is  the  set  of  initial  states;  and 
Sf  =  i':?  I  S'  C  Sunsafe  ^  unsafe  €  L(s)}  is  the  Set  of  sueeess  states.  This  algorithm  is  given  in  Figure  2. 

In  symbolie  model  eheekers,  sueh  as  NuSMV,  the  transition  relation  and  sets  of  states  are 
represented  using  BDDs  [2],  a  eompaet  representation  for  boolean  funetions.  There  are  effieient  BDD 
algorithms  for  all  operations  used  in  our  algorithm. 

2.2.  Attack  Graph  Properties 

We  ean  show  that  an  attaek  graph  G  generated  by  the  algorithm  in  Figure  2  is  exhaustive  (Lemma 
la)  and  sueeinet  (Lemma  lb).  Whereas  sueeinetness  is  a  property  about  states  in  an  attaek  graph.  Lemma 
le  states  a  similar  property  for  transitions.  Appendix  A  eontains  a  proof  of  the  lemma. 


Lemma  1 

(a)  (Exhaustive)  An  exeeution  e  of  the  input  model  (S,  R,  So,  L)  violates  the  property  p  =  AG(^unsafe)  if 
and  only  if  e  is  an  attaek  in  the  attaek  graph  G  =  (Sunsafe,  R^,  Sf,  Sf). 

(b)  (Succinct  states)  A  state  s'  of  the  input  model  ( S,  R,  So,  L)  is  in  the  attaek  graph  G  if  and  only  if  there 
is  an  attaek  in  G  that  eontains  s. 

(c)  (Succinct  transitions)  A  transition  t  =  {si,  s2)  of  the  input  model  (S,  R,  So,  L)  is  in  the  attaek  graph  G 
if  and  only  if  there  is  an  attaek  in  G  that  ineludes  t. 

3.  An  Intrusion  Detection  Example 

Consider  the  example  network  shown  in  Figure  3.  There  are  two  target  hosts,  ipl  and  ip2,  and  a 
firewall  separating  them  from  the  rest  of  the  Internet.  As  shown,  eaeh  host  is  running  two  of  three  possible 
serviees  (ftp,  sshd,  a  database).  An  intrusion  deteetion  system  (IDS)  watehes  the  network  traffie  between 
the  target  hosts  and  the  outside  world.  There  are  four  possible  atomie  attaeks,  identified  numerieally  as 
follows:  (0)  sshd  buffer  overflow,  (1)  ftp  .rhosts,  (2)  remote  login,  and  (3)  loeal  buffer  overflow  (an 
explanation  of  eaeh  attaek  follows).  If  an  atomie  attaek  is  detectable,  the  intrusion  deteetion  system  will 
trigger  an  alarm;  if  an  attaek  is  stealthy,  the  IDS  misses  it.  The  ftp  .rhosts  attaek  needs  to  find  the  target 
host  with  two  vulnerabilities:  a  writable  home  direetory  and  an  exeeutable  eommand  shell  assigned  to  the 
ftp  user  name.  The  loeal  buffer  overflow  exploits  a  vulnerable  version  of  the  xterm  exeeutable. 

The  intruder  launehes  his  attaek  starting  from  a  single  eomputer,  ipa,  whieh  lies  outside  the 
firewall.  His  eventual  goal  is  to  disrupt  the  funetioning  of  the  database.  For  that,  the  intruder  needs  root 
aeeess  on  the  database  host  ipl. 

We  eonstruet  a  finite  state  model  of  the  network  so  that  eaeh  state  transition  eorresponds  to  a 
single  atomie  attaek  by  the  intruder.  A  state  in  the  model  represents  the  state  of  the  system  between  atomie 
attaeks.  A  typieal  transition  from  state  to  state  Si  eorresponds  to  an  atomie  attaek  whose  preeonditions  are 
satisfied  in  and  whose  posteonditions  hold  in  state  Si.  An  attack  is  a  sequenee  of  state  transitions 
eulminating  in  the  intruder  aehieving  his  goal.  The  entire  attaek  graph  is  thus  a  representation  of  all  the 
ways  the  intruder  ean  sueeeed. 
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3.1.  Finite  State  Model 


The  network.  We  model  a  network  as  a  set  of  faets,  eaeh  represented  as  a  relational  predieate.  The  state  of 
the  network  speeifies  serviees,  host  vulnerabilities,  eonneetivity  between  hosts,  and  a  remote  login  trust 
relation.  Following  Ritehey  and  Ammann  [20],  eonneetivity  is  expressed  as  a  ternary  relation  R  C  Host  x 
Host  X  Port,  where  R{hl,  h2,  p)  means  that  host  h2  is  reaehable  from  host  hi  on  port  p.  Note  that  the 
eonneetivity  relation  ineorporates  firewalls  and  other  elements  that  restriet  the  ability  of  one  host  to  eonneet 
to  another.  Slightly  abusing  notation,  we  say  R{hl,  h2)  when  there  is  a  network  route  from  hi  to  h2. 
Similarly,  we  model  trust  as  a  binary  relation  7r  C  Host  x  Host,  where  Tr(hl,  h2)  indieates  that  a  user  may 
log  in  from  host  h2  to  host  hi  without  authentieation  (i.e.,  host  hi  “trusts”  host  h2). 

Initially,  there  is  no  trust  between  any  of  the  hosts;  the  trust  relation  Zr  is  empty.  The  eonneetivity 
relation  R  is  shown  in  the  following  table.  An  entry  in  the  table  eorresponds  to  a  pair  of  hosts  {hi,  h2).  Eaeh 
entry  is  a  triple  of  boolean  values.  The  first  value  is  ‘y’  if  hi  and  h2  are  eonneeted  by  a  physieal  link,  the 
seeond  value  is  ‘y’  if  hi  ean  eonneet  to  h2  on  the  ftp  port,  and  the  third  value  is  ‘y’  if  hi  ean  eonneet  to  h2 
on  the  sshd  port. 

The  intruder.  The  intruder  has  a  store  of  knowledge  about  the  target  network  and  its  users.  This 
knowledge  ineludes  host  addresses,  known  vulnerabilities,  information  about  running  serviees,  ete.  The 
funetion  plvlh:  Hosts  — >  {none,  user,  root}  gives  the  level  of  privilege  that  intmder  A  has  on  eaeh  host. 
There  is  a  total  order  on  the  privilege  levels:  none  <  user  <  root.  Initially,  the  intruder  has  root  aeeess  on 
his  own  maehine  ipa,  but  no  aeeess  to  the  other  hosts. 
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y,n,n 
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Intrusion  detection  system.  Atomic  attacks  are  classified  as  being  either  detectable  or  stealthy  with 
respect  to  the  Intrusion  Detection  System  (IDS).  If  an  attack  is  detectable,  it  will  trigger  an  alarm  when 
executed  on  a  host  or  network  segment  monitored  by  the  IDS.  If  an  attack  is  stealthy,  the  IDS  does  not  see 
it. 


We  specify  the  IDS  with  a  function  ids:  Host  x  Host  x  Attack  — >  {d,  s,  b},  where  ids(hi,  h2,  a)  =  d 
if  attack  a  is  (ietectable  when  executed  with  source  host  hi  and  target  host  hy,  ids(hi,  h2,  a)  ^  s  \f  attack  a  is 
S'tealthy  when  executed  with  source  host  hi  and  target  host  h2',  and  ids(hi,  h2,  a)  =  b  '\i  attack  a  has  both 
detectable  and  stealthy  strains,  and  success  in  detecting  the  attack  depends  on  which  strain  is  used.  When 
hi  and  refer  to  the  same  host,  ids(hi,  h2,  a)  specifies  the  intrusion  detection  system  component  (if  any) 
located  on  that  host.  When  hi  and  h2  refer  to  different  hosts,  ids(hi,  h2,  a)  specifies  the  intrusion  detection 
system  component  (if  any)  monitoring  the  network  path  between  h i  and  h2. 

In  addition,  a  global  boolean  variable  specifies  whether  the  IDS  alarm  has  been  triggered  by  any 
previously  executed  atomic  attack. 

In  our  example,  the  paths  between  {ip a,  ipi)  and  between  {ip a,  ip 2)  are  monitored  by  a  single 
network-based  IDS.  The  path  between  {ipi,  ip2)  is  not  monitored.  There  are  no  host-based  intrusion 
detection  components. 
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Figure  3.  Example  Network 
Atomic  Attacks.  We  model  four  atomie  attaeks: 

1.  sshd  buffer  overflow.  This  remote-to-root  attaek  immediately  gives  a  remote  user  a  root  shell  on  the 
target  maehine.  It  has  deteetable  and  stealthy  variants. 

2.  ftp  .rhosts:  Using  an  ftp  vulnerability,  the  intmder  ereates  an  .rhosts  fde  in  the  ftp  home  direetory, 
ereating  a  remote  login  trust  relationship  between  his  maehine  and  the  target  maehine.  This  attaek  is 
stealthy. 

3.  remote  login:  Using  an  existing  remote  login  trust  relationship  between  two  maehines,  the  intruder  logs 
in  from  one  machine  to  another,  getting  a  user  shell  without  supplying  a  password.  This  operation  is  usually 
a  legitimate  action  performed  by  regular  users,  but  from  the  intruder’s  point  of  view,  it  is  an  atomic  attack. 
This  attack  is  detectable. 

4.  local  buffer  overflow.  If  the  intruder  has  acquired  a  user  shell  on  the  target  machine,  the  next  step  is  to 
exploit  a  buffer  overflow  vulnerability  on  a  setuid  root  fde  to  gain  root  access.  The  intruder  may  transfer 
the  necessary  binary  code  via  ftp  (or  scp)  or  create  it  locally  using  an  editor  such  as  vi.  This  attack  is 
stealthy. 


Each  atomic  attack  is  a  rule  that  describes  how  the  intruder  can  change  the  network  or  add  to  his 
knowledge  about  it.  A  specification  of  an  atomic  attack  has  four  components:  intruder  preconditions, 
network  preconditions,  intruder  effects,  and  network  effects.  The  intruder  preconditions  component  lists  the 
intmder’ s  capabilities  and  knowledge  required  to  launch  the  atomic  attack.  The  network  preconditions 
component  lists  the  facts  about  the  network  that  must  hold  before  launching  the  atomic  attack.  Finally,  the 
intruder  and  network  effects  components  list  the  attack’s  effects  on  the  intmder  and  on  the  network  state, 
respectively.  For  example,  the  sshd  buffer  overflow  attack  is  specified  as  follows: 
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attack  sslKl-biil'fer-o\crno\v  is 
iiitnuloi'  preconditions 

I*  User-level privik’f^L’s  on  host  S  *} 
jjfvf.ii.'S')  >  user 

\o  mot-level  privileires  on  liostT  *) 
iJlvt.\(T)  <  root 
notnork  preconditions 

t*  Host  T is  running,  sshd  *) 
ssh7* 

(*  IlostT  is  tvachahle fmm  S  on  port  sp  *} 

R{S,T,sp) 

intruder  effects 

(*  Root-level privile<ies  on  host  T  *) 

:=  root 

network  effects 

i*  Host  T is  not  running sslul  *) 

-issliT* 

end 

3.2.  NuSMV  Encoding 

It  is  necessary  to  ensure  that  the  model  checker  considers  all  atomic  attacks  in  each  state,  so  that 
the  resulting  attack  graph  enumerates  all  possible  attacks.  So  the  model  checker  must  choose  attacks 
nondeterministically,  subject  to  preconditions  being  fulfdled.  We  also  allow  nondeterministic  choices  for 
the  source  host  and  the  target  host  of  each  atomic  attack.  The  NuSMV  encoding  of  the  model  contains 
nondeterministically  assigned  state  variables  that  specify: 

•  which  attack  (concretely,  an  attack  number)  will  be  tried  next 

•  the  source  host  from  which  the  atomic  attack  will  be  initiated 

•  the  target  host  of  the  atomic  attack 

•  whether  the  next  attack  is  detectable  or  stealthy  with  respect  to  a  given  intrusion  detection  system. 
This  variable  is  set  deterministically  when  the  next  attack  is  known  to  be  detectable  or  stealthy. 
When  the  next  attack  has  both  detectable  and  stealthy  strains,  the  variable  is  set 
nondeterministically. 
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In  an  effort  to  reduce  the  state  space  of  the  model,  the  NuSMV  encoding  restricts  the  legal  states 
to  those  where  the  attack  number,  source,  and  target  variables  correspond  to  an  enabled  attack.  In  addition, 
when  a  variable’s  value  is  irrelevant  in  a  particular  context,  we  deterministically  set  the  variable  to  a  fixed 
value  in  that  context.  As  an  example,  when  the  next  attack  is  local  to  one  host,  we  force  the  value  of  the 
variable  designating  the  source  host  of  the  attack  to  be  the  same  as  the  target  host  of  the  attack. 

3.3.  Experimental  Results:  Attack  Graphs 

Recall  that  the  goal  of  our  intruder  is  to  obtain  access  to  the  database  service  running  on  host  ip2. 
For  that,  the  intruder  needs  to  get  root  access  on  ip2  without  triggering  an  IDS  alarm.  Thus,  the  property  we 
want  to  violate  (in  order  to  get  the  attack  graph)  is  that  either  an  intruder  never  gets  root  privilege  on  host 
ip 2  or  he  is  detected  by  the  IDS: 

AG(network.adversary.privilege[2]  <  network.priv.root  \  network.detected) 

Figure  4  shows  the  attack  graph  produced  by  NuSMV  for  this  property.  Each  node  is  labeled  by  an 
attack  id  number  (see  table  below),  which  corresponds  to  the  atomic  attack  to  be  attempted  next,  a  flag  S/D 
indicating  whether  the  attack  is  stealthy  or  detectable  by  the  intrusion  detection  system;  and  the  numbers  of 
the  source  and  target  hosts.  The  following  tables  show  attack  and  host  numbers. 


no. 

a  Hack 

0 

sslul  bulTor  ovorllow 

1 

lip  .rhosls 

romoto  loiiiii 

3 

local  biilTer  ovorllow 

no. 

host 

0 

iPc 

1 

ipi 

2 

¥2 

Any  path  in  the  graph  from  a  root  node  to  a  leaf  node  shows  a  sequence  of  atomic  attacks  that  the 
intruder  can  employ  to  achieve  his  goal  while  remaining  undetected.  For  instance,  the  path  highlighted  by 
double  boxed  nodes  consists  of  the  following  sequence  of  four  atomic  attacks:  overflow  sshd  buffer  on  host 
1,  overwrite  .rhosts  fde  on  host  2  to  establish  rsh  trust  between  hosts  1  and  2,  log  in  using  rsh  from  host  1  to 
host  2,  and  finally,  overflow  a  local  buffer  on  host  2  to  obtain  root  privileges. 
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3.4.  Performance  Observations 


We  conducted  the  experiments  on  a  Pentium  III/lGhz  RAM  running  RedHat  Linux  7.0. 

The  NuSMV  encoding  of  the  simple  network  in  Figure  3  has  91  bits  of  state  (i.e.,  potentially  2^* 
states),  but  only  101  states  are  reachable.  The  tool  automatically  found  an  appropriate  BDD  variable 
ordering  under  which  the  run  time  of  the  tool  on  this  example  is  about  5  seconds. 

To  gauge  how  the  mn  time  depends  on  the  scale  of  the  model,  we  enlarged  the  example  with  two 
additional  hosts,  four  additional  atomic  attacks,  several  new  vulnerabilities,  and  flexible  firewall 
configurations.  The  enlarged  model  has  229  bits  of  state  and  6190  reachable  states.  The  attack  graph  has 
5948  nodes  and  68364  edges.  NuSMV  took  2  hours  to  construct  the  attack  graph  for  this  model;  however, 
the  model  checking  part  took  only  5  minutes.  The  performance  bottleneck  is  inside  our  graph  generation 
procedure,  and  we  are  working  on  performance  enhancements. 

4.  Analysis  of  Attack  Graphs 

Once  we  have  an  attack  graph  generated  for  a  specific  network  with  respect  to  a  given  safety 
property,  the  user  may  wish  to  probe  it  for  further  analysis.  For  example,  an  analyst  may  be  faced  with  a 
choice  of  deploying  either  additional  network  attack  detection  tools  or  prevention  techniques.  Which  would 
be  more  cost-effective  to  deploy?  In  doing  the  minimization  analysis  described  in  Sections  4.1  through  4.3, 
the  analyst  can  determine  a  minimal  set  of  atomic  attacks  that  must  be  prevented  to  guarantee  that  the 
intruder  cannot  achieve  his  goal.  In  doing  the  reliability  analysis  described  in  Section  4.4,  the  analyst  can 
determine  the  likelihood  that  an  intruder  will  succeed  or  the  likelihood  that  the  IDS  will  detect  his  attack 
activity. 

4.1.  Minimization  Analysis 

Given  a  fixed  set  of  atomic  attacks,  not  all  of  them  may  be  available  to  the  intruder.  Can  we  find  a 
minimal  set  of  atomic  attacks  that  we  should  prevent  so  that  the  intruder  fails  to  achieve  his  goal?  To 
answer  this  question,  we  modify  the  model  slightly,  making  only  a  subset  of  atomic  attacks  available  to  the 
intruder.  For  simplicity,  we  nondeterministically  decide  which  subset  to  consider  initially,  before  any  attack 
begins;  once  the  choice  is  made,  the  subset  of  available  atomic  attacks  remains  constant  during  any  given 
attack.  We  ran  the  model  checker  on  the  modified  model  with  the  invariant  property  that  says  the  intruder 
never  gets  root  privilege  on  host  ip2. 

AG(network.adversary.priviIege[2J  <  network.priv.roof) 

The  post-processor  marked  the  states  where  the  intruder  has  been  detected  by  the  IDS.  The  result 
is  shown  in  Figure  5.  The  white  rectangles  indicate  states  where  the  attacker  had  not  yet  been  detected  by 
the  intrusion  detection  system.  The  black  rectangles  are  states  where  the  intrusion  detection  system  has 
sounded  the  alarm.  Thus,  white  leaf  nodes  are  desirable  for  the  attacker  in  that  the  objective  is  achieved 
without  detection.  Black  leaf  nodes  are  less  desirable,  the  attacker  achieves  his  objective,  but  the  alarm 
goes  off 


The  resolution  of  which  atomic  attacks  are  available  to  the  intruder  happens  in  the  circular  nodes 
near  the  root  of  the  graph.  The  first  transition  out  of  the  root  (initial)  state  picks  the  subset  of  attacks  that 
the  intruder  will  use.  Each  child  of  the  root  node  is  itself  the  root  of  a  disjoint  subgraph  where  the  subset  of 
atomic  attacks  chosen  for  that  child  is  used.  Note  that  the  number  of  such  subgraphs  descending  from  the 
root  node  corresponds  to  the  number  of  subsets  of  atomic  attacks  with  which  the  intruder  can  be  successful. 
The  model  checker  determines  that  for  any  other  possible  subset,  there  is  no  possible  successful  sequence 
of  atomic  attacks. 

The  root  of  the  graph  in  Figure  5  has  two  subgraphs,  corresponding  to  two  subsets  of  atomic 
attacks  that  will  allow  the  intruder  to  succeed.  In  the  left  subgraph  the  sshd  buffer  overflow  attack  is  not 
available  to  the  intruder;  it  can  readily  be  seen  that  the  intruder  can  still  succeed,  but  cannot  do  so  while 
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remaining  undetected  by  the  IDS.  In  the  right  subgraph,  all  attacks  are  available.  Thus,  the  entire  attack 
graph  implies  that  all  atomic  attacks  other  than  the  sshd  attack  are  indispensable:  the  intruder  cannot 
succeed  without  them.  The  analyst  can  use  this  information  to  guide  decisions  on  which  network  defenses 
can  be  profitably  upgraded. 

The  white  cluster  in  the  middle  of  the  figure  is  isomorphic  to  the  scenario  graph  presented  in 
Figure  4;  it  shows  the  ways  in  which  the  intruder  can  achieve  his  objective  without  detection  (i.e.,  all  paths 
by  which  the  intruder  reaches  a  white  leaf  in  the  graph). 

Checking  every  possible  subset  of  attacks  is  exponentialing  the  number  of  attacks.  In  the  next 
subsection,  we  show  that  finding  the  minimum  set  of  atomic  attacks  which  must  be  removed  to  thwart  the 
intmder  is  in  fact  AT’-complete.  Then  in  the  following  subsection  we  also  show  how  a  minimal  set  can  be 
found  in  polynomial-time. 

4.2.  Minimum  and  Minimal  Critical  Attack  Sets 

Assume  that  we  have  produced  an  attack  graph  corresponding  to  the  following  safety  property: 

AG(^unsafe) 

Let  A  be  the  set  of  attacks.  Let  G  =  (S,  E,  So,  L)  be  the  attack  graph,  where  S  is  the  set  of  states,  E 
C  5  V  5  is  the  set  of  edges,  s'o  C  5  is  the  initial  state,  and  L:  E  ^  A  U  {s}  is  a  labeling  function  where  L(e)  = 
a  if  an  edge  e  ^  (s  ^  s’)  corresponds  to  an  attack  a,  otherwise  L(e)  =  e.  Given  a  state  ^  G  5,  a  set  of  attacks 
C  is  critical  with  respect  to  ^  if  and  only  if  the  intruder  cannot  reach  his  goal  from  ^  when  the  attacks  in  C 
are  removed  from  his  arsenal  Equivalently,  C  is  critical  with  respect  to  ^  if  and  only  if  every  path  from  ^ 
to  an  unsafe  state  has  at  least  one  edge  labeled  with  an  attack  a  €  C. 

A  critical  set  corresponding  to  a  state  s  is  minimal  (denoted  A{s))  if  no  subset  of  A(sJ  is  critical 
with  respect  to  s.  A  critical  set  corresponding  to  a  state  ^  is  minimum  (denoted  M(s))  if  there  is  no  critical 
set  M’(so)  such  that  |M’(s')|  <  I  M(s)  I  .  In  general,  there  can  be  multiple  minimum  and  multiple  minimal 
critical  sets  corresponding  to  a  state  s.  Of  course,  all  minimum  critical  sets  must  be  of  the  same  size. 


Figure  5.  Attack  Graph  Analysis 


Given  an  attack  graph  G  =  (S,  E  So,  L),  consider  the  problem  of  finding  a  minimum  critical  set  of 
attacks  M(so).  We  will  call  this  problem  the  Minimum  Critical  Set  of  Attacks  (MCSA)  problem.  We  prove 
that  the  decision  version  of  MCSA  is  AP-complete. 

Lemma  2  Assume  that  we  are  given  an  attack  graph  G  =  (S,  E  So,  L),  and  an  integer  k.  The  problem  of 
determining  whether  there  is  a  critical  set  C(so)  such  that  |  C(so)  |  <  k  is  AP-complete. 

Proof  Sketch:  First,  we  prove  that  the  problem  is  in  NP.  Guess  a  set  C  C  ^  with  size  <  k.  We  need  to 
check  that  C  is  a  critical  set  of  attacks.  This  can  be  accomplished  in  polynomial  time  using  the  procedure 
isCritical(G,  C)  described  below.  Therefore,  the  problem  is  in  NP. 

To  prove  that  the  problem  is  AP-hard,  we  give  a  reduction  from  the  minimum  cover  problem  [11, 
Page  222].  See  Appendix  B  for  the  remaining  details  of  the  proof 
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4.3.  Computing  Minimal  Critical  Sets 

Consider  now  the  problem  of  finding  a  minimal  eritieal  set^f^^j  eorresponding  to  the  initial  state 
So.  We  give  an  algorithm  for  eomputing  A(so)  that  runs  in  time  0(mn),  where  m  =  1^1  +  \E  I  is  the  size  of 
the  attaek  graph  G  and  n  =  \A  |  is  the  number  of  attaeks.  First,  we  describe  a  procedure  isCritical  (G,  C), 
which  determines  whether  a  set  C  C  ^  is  a  critical  set  corresponding  to  the  initial  state  Sg.  This  procedure 
runs  in  0(m)  time.  We  simply  delete  all  edges  from  G  that  are  labeled  with  an  action  from  the  set  C.  After 
that,  if  an  unsafe  state  is  still  reachable  from  the  initial  state  Sg,  then  C  is  not  a  critical  set  (because  there  is  a 
path  from  Sg  to  an  unsafe  state  which  does  not  use  an  attack  from  the  set  Q.  This  step  can  be  performed  in 
0(m)  time  using  standard  graph  algorithms  [6].  The  algorithm  starts  with  A  as  the  empty  set.  At  each  step 
of  the  algorithm  we  perform  the  following  procedure: 

if  isCritical  (G,  C)  returns  true,  the  algorithm  stops  and  returns  A.  Otherwise,  pick  an  a  €  a\C 
and  add  it  to  the  set  C. 

We  start  with  an  empty  set  and  keep  adding  attacks  until  we  obtain  a  critical  set.  Notice  that  since 
.4  is  a  critical  set,  the  number  of  steps  taken  by  the  algorithm  is  at  most  n.  Each  step  takes  0(m)  time,  so 
that  the  worst  case  running  time  of  the  algorithm  is  0(mn).  If  attacks  have  costs  associated  with  them,  then 
at  each  step  we  can  pick  an  attack  that  has  the  minimum  cost,  i.e.,  pick  ana  €  A\C  with  the  minimum  cost. 
This  will  bias  the  procedure  to  pick  sets  with  lower  cost. 

Next,  we  show  how  the  procedure  described  above  can  be  carried  out  using  model  checking. 
Assume  that  the  set  of  attacks  is  A  is  (ai,  ....  a,).  We  associate  a  boolean  variable  x,  with  each  attack  a,.  If 
attack  a,  is  activated  (the  intruder  can  use  the  attack),  x,  =  /,  otherwise  X/=  0.  The  variable  x,  appears  in  the 
precondition  corresponding  to  the  attack  a,.  Initially,  all  x,  s'  are  set  to  0,  representing  that  the  set  C  is 
empty.  Notice  that  if  the  model  checker  returns  a  counter-example,  then  there  is  a  path  from  the  initial  state 
to  an  unsafe  state.  Recall  that  the  specification  is: 

AG(^unsafe) 

Now  in  each  step  in  the  procedure,  we  pick  an  index  i  such  that  x,  =  0  and  set  x,  =  0.  We  stop  the 
first  time  the  model  checker  provides  a  counter-example.  The  set  of  attacks  whose  corresponding  variables 
are  set  to  1  represents  a  critical  set.  The  worst  case  complexity  of  this  procedure  is  the  same  as  the  one 
given  before,  but  in  practice  symbolic  model  checkers,  such  as  NuSMV,  will  perform  efficiently. 
Intuitively,  we  are  using  the  model  checker  to  implement  the  procedure  isCritical(G,C). 


4.4.  Probabilistic  Reliability  Analysis 

When  empirical  information  about  the  likelihood  of  certain  events  in  the  network  is  available,  we 
can  use  well  known  graph  algorithms  to  answer  quantitative  questions  about  the  attack  graph.  Suppose  we 
know  the  probabilities  of  some  transitions  in  the  scenario  graph.  After  appropriately  annotating  the  attack 
graph  with  these  probabilities,  we  can  interpret  it  as  a  Markov  Decision  Process  (see  [12]  for  details). 

The  standard  MDP  value  iteration  algorithm  [19]  computes  the  optimal  policy  for  selecting 
actions  in  an  MDP  that  results  in  maximum  benefit  (or  minimum  cost)  for  the  decision  maker.  Value 
iteration  can  compute  the  worst  case  probability  of  intruder  success  in  an  attack  graph  as  follows.  We 
assign  all  nodes  where  the  intruder’s  goal  has  been  achieved  the  benefit  value  of  1,  and  all  other  nodes  the 
benefit  value  of  0.  Then  we  run  the  value  iteration  algorithm.  The  algorithm  finds  the  optimal  attack 
selection  policy  for  the  intruder  and  assigns  the  expected  benefit  value  resulting  from  that  policy  to  each 
state  in  the  scenario  graph.  The  expected  value  is  a  fraction  of  1,  and  it  is  equivalent  to  the  probability  of 
getting  to  the  goal  state  from  that  node,  assuming  the  intruder  always  follows  the  optimal  policy. 

We  implemented  the  value  iteration  algorithm  in  an  attack  graph  post-processor  (“Reliability 
Analyzer”  of  Figure  1)  and  ran  it  on  a  slightly  modified  version  of  our  example.  In  the  modified  example 
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each  attack  has  both  detectable  and  stealthy  variants.  We  assumed  that  for  a  typical  network,  a  certain 
percentage  of  attempted  intrusions  is  performed  by  sophisticated  attackers  who  keep  on  top  of  latest  IDS 
technology  and  use  stealthy  attack  variants.  We  arbitrarily  assigned  probabilities  of  detecting  each  atomic 
attack  as  follows:  0.8  for  sshd  buffer  overflow,  0.5  for  ftp  .rhosts,  0.95  for  the  remote  login,  and  0.2  for 
local  buffer  overflow.  The  intruder’s  goal  is  to  get  root  access  at  host  ip2  while  remaining  undetected. 
Accordingly,  the  states  where  this  goal  has  been  achieved  were  assigned  benefit  value  1. 

In  this  setup,  the  computed  probability  of  intruder  success  is  0.2,  and  his  best  strategy  is  to  attempt 
sshd  buffer  overflow  on  host  ipi,  and  then  conduct  the  rest  of  the  attack  from  that  host.  The  only  possibility 
of  detection  is  the  sshd  buffer  overflow  attack  itself,  since  the  IDS  does  not  see  the  activity  between  hosts 
ip  I  and  ip 2. 

The  system  administrator  can  use  this  technique  to  evaluate  effectiveness  of  various  security  fixes. 
For  instance,  installing  an  additional  IDS  component  to  monitor  the  network  traffic  between  hosts  ipi  and 
ip2  reduces  the  probability  of  the  intruder  remaining  undetected  to  0.025;  installing  a  host-based  IDS  on 
host  ip2  reduces  the  probability  to  0.16.  Other  things  being  equal,  this  is  an  indication  that  the  former 
remedy  is  more  effective. 

5.  Related  Work 

The  work  by  Phillips  and  Swiler  [18]  is  the  closest  to  ours.  They  propose  the  concept  of  attack 
graphs  that  is  similar  to  the  one  described  here.  However,  they  take  an  “attack-centric”  view  of  the  system. 
Since  we  work  with  a  general  modeling  language,  we  can  express  in  our  model  both  seemingly  benign 
system  events  (such  as  failure  of  a  link)  and  malicious  events  (such  as  attacks).  Therefore,  our  attack  graphs 
are  more  general  than  the  one  proposed  by  Phillips  and  Swiler.  Recently,  Swiler  et  al.  describe  a  tool  [23] 
for  generating  attack  graphs  based  on  their  previous  work.  Their  tool  constructs  the  attack  graph  by  forward 
exploration  starting  from  the  initial  state.  A  symbolic  model  checker  (like  NuSMV)  works  backward  from 
the  goal  state  to  construct  the  attack  graph.  A  major  advantage  of  the  backward  algorithm  is  that 
vulnerabilities  that  are  not  relevant  to  the  safety  property  (or  the  goal  of  the  intruder)  are  never  explored. 
Our  approach  can  result  in  significant  savings  in  space.  (Swiler  et  al.  refer  to  the  advantages  of  the 
backward  search  in  their  paper  [23].)  More  generally,  the  advantage  of  using  model  checking  instead  of 
forward  search  is  that  the  technique  can  be  expanded  to  include  liveness  properties,  which  can  model 
service  guarantees  in  the  face  of  malicious  activity. 

Moreover,  by  using  model  checking  we  leverage  all  the  advanced  techniques  developed  in  that 
area.  For  example,  the  cone  of  influence  reduction  [14]  in  model  checking  abstracts  away  part  of  the  system 
that  is  not  relevant  to  the  specification.  In  our  context,  if  there  is  a  vulnerability  that  is  not  relevant  to  a 
safety  property,  it  will  not  be  considered  during  model  checking.  Finally,  the  attack  graph  analysis 
suggested  by  Phillips  and  Swiler  is  different  from  the  ones  presented  in  this  paper.  We  plan  to  incorporate 
their  analysis  into  our  tool  suite. 

Templeton  and  Levitt  [24]  propose  a  requires/provides  model  for  attacks.  The  model  links  atomic 
attacks  into  scenarios,  with  earlier  atomic  attacks  supplying  the  prerequisites  for  the  later  ones.  Templeton 
and  Levitt  point  out  that  relating  seemingly  innocuous  system  behavior  to  known  attack  scenarios  can  help 
discover  new  atomic  attacks.  However,  they  do  not  consider  combining  their  attack  scenarios  into  attack 
graphs. 


Dacier  [8]  proposes  the  concept  of  privilege  graphs.  Each  node  in  the  privilege  graph  represents  a 
set  of  privileges  owned  by  the  user;  edges  represent  vulnerabilities.  Privilege  graphs  are  then  explored  to 
construct  attack  state  graphs,  which  represents  different  ways  in  which  an  intruder  can  reach  a  certain  goal, 
such  as  root  access  on  a  host.  He  also  defines  a  metric,  called  the  mean  effort  to  failure  or  METF,  based  on 
the  attack  state  graphs.  Orlato  et  al.  describe  an  experimental  evaluation  of  a  framework  based  on  these 
ideas  [17].  At  the  surface,  our  notion  of  attack  graphs  seems  similar  to  the  one  proposed  by  Dacier. 
However,  as  is  the  case  with  Phillips  and  Swiler,  Dacier  takes  an  “attack-centric”  view  of  the  world.  As 
pointed  out  above,  our  attack  graphs  are  more  general.  From  the  experiments  conducted  by  Orlato  et  al.  it 
appears  that  even  for  small  examples  the  space  required  to  constmct  attack  state  graphs  becomes 
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prohibitive.  By  basing  our  algorithm  on  model  eheeking  we  take  advantage  of  advanees  in  representing 
large  state  spaees  and  ean  thus  hope  to  represent  large  attaek  graphs.  We  ean  perform  the  analytieal 
analysis  proposed  by  Daeier  on  attaek  graphs  eonstmeted  by  our  tool.  We  also  plan  to  eonduet  an 
experimental  evaluation  similar  to  the  one  performed  by  Orlato  et  al. 

Ritehey  and  Ammann  [20]  also  use  model  eheeking  for  vulnerability  analysis  of  networks.  They 
use  the  (unmodified)  model  eheeker  SMV  [22].  They  ean  obtain  only  one  eounter-example,  i.e.,  only  one 
attaek  eorresponding  to  an  unsafe  state.  In  eontrast,  we  modified  the  model  eheeker  NuSMV  to  produee 
attaek  graphs,  representing  all  possible  attaeks.  We  also  deseribed  post-faeto  analyses  that  ean  be 
performed  on  these  attaek  graphs.  These  analysis  teehniques  eannot  be  meaningfully  performed  on  single 
attaeks. 


Graph-based  data  stmetures  have  also  been  used  in  network  intrusion  deteetion  systems,  sueh  as 
NetSTAT  [25].  There  are  two  major  eomponents  in  NetSTAT,  a  set  of  probes  plaeed  at  different  points  in 
the  network  and  an  analyzer.  The  analyzer  proeesses  events  generated  by  the  probes  and  generates  alarms 
by  eonsulting  a  network  faet  base  and  a  seenario  database.  The  network  faet  base  eontains  information 
(sueh  as  eonneetivity)  about  the  network  being  monitored.  The  seenario  database  has  a  direeted  graph 
representation  of  various  atomie  attaeks.  For  example,  the  graph  eorresponding  to  an  IP  spoofing  attaek 
shows  various  steps  that  an  intruder  takes  to  mount  that  speeifie  attaek.  The  authors  state  that  “in  the 
analysis  proeess  the  most  eritieal  operation  is  the  generation  of  all  possible  instanees  of  an  attaek  seenario 
with  respeet  to  a  given  target  network.”  Therefore,  we  believe  that  our  tool  ean  help  network  intrusion 
deteetion  systems,  sueh  as  NetSTAT,  in  automatieally  produeing  attaek  seenarios.  We  leave  this  as  a  future 
direetion  for  researeh. 

Cuppens  and  Ortalo  [7]  propose  a  deelarative  language  (LAMBDA)  for  speeifying  attaeks  in 
terms  of  pre-  and  posteonditions.  LAMBDA  is  a  superset  of  the  simple  language  we  used  to  model  attaeks 
in  our  work.  The  language  is  modular  and  hierarehieal;  higher-level  attaeks  ean  be  deseribed  using  lower- 
level  attaeks  as  eomponents.  LAMBDA  also  ineludes  intrusion  deteetion  elements.  Attaek  speeifleations 
inelude  information  about  the  steps  needed  to  deteet  the  attaek  and  the  steps  needed  to  verify  that  the  attaek 
has  already  been  earried  out.  We  are  studying  the  possibility  eonverting  our  representation  of  attaeks  to 
LAMBDA. 

6.  Future  Work 

We  have  so  far  restrieted  our  work  to  only  safety  (invariant)  properties.  To  exploit  the  full  power 
of  model  eheeking,  we  need  a  method  of  generating  attaek  graphs  for  more  general  elasses  of  properties. 
For  example,  the  following  liveness  property  states  that  a  user  will  always  be  able  to  aeeess  a  server 
whenever  he  wants  to. 

A  G (server,  user,  request  -^AF( server,  user,  access ) ) 

This  property  would  not  be  true  if  the  server  ean  be  disabled  using  a  denial-of-serviee  attaek.  We 
plan  to  explore  generation  of  attaek  graphs  for  universally  quantified  fragments  of  Computational  Tree 
Logie  and  Linear  Temporal  Logie. 

To  make  our  tool  suite  more  usable  by  seeurity  experts  and  system  administrators,  we  see  the 
value  of  building  a  library  of  speeifleations  of  atomie  attaeks.  Our  hope  is  that  inereasing  this  arsenal  of 
speeifleations  outpaees  the  growth  in  the  arsenal  of  known  attaeks.  Furthermore,  one  reason  model 
eheeking  has  been  so  sueeessful  is  that  it  diseovers  unknown  bugs  in  hardware  eireuits  and  protoeols  1. 
Analogously,  by  using  our  tool  suite  based  on  the  power  of  model  eheeking  teehniques,  we  ean  potentially 
diseover  new,  unexpeeted  attaeks,  and  henee  identify  new  network  vulnerabilities. 

In  prineiple,  our  teehnique  is  not  limited  to  modeling  attaeks  only.  The  expressive  power  of  model 
eheekers  lets  us  model  benign  system  aetivity  as  well.  We  believe  that  the  ability  of  modem  model 
eheekers  to  handle  more  eomplex  properties  ean  be  adapted  to  our  tool.  For  example,  “liveness”  properties 
sueh  as  “a  legitimate  user’s  transaetion  will  finish  despite  intrader  interferenee”  are  easily  speeifled  in 
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temporal  logic  and  checked  by  a  model  checker.  Unlike  invariants,  such  properties  cannot  be  handled  by 
simple  Reachability  analysis  or  other  classical  graph  algorithms.  Adapting  the  power  of  model  checking  to 
analyze  such  properties  opens  a  promising  research  direction  in  automated  security  analysis. 
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A.  Exhaustive  and  Succinct  Attack  Graphs 

Lemma  1:  (a)  (Exhaustive)  An  execution  e  of  the  input  model  (5,  R,  Sg,  L)  violates  the  property  p  = 
AG(~'unsafe)  if  and  only  if  e  is  an  attack  in  the  attack  graph  G  =  (S^unsafe,  Rp,  SpO,  Sps). 

(h)  (Succinct  states)  A  state  ^  of  the  input  model  (5,  R,  Sg,  L)  is  in  the  attack  graph  G  if  and  only  if  there  is 
an  attack  in  G  that  contains  s. 

(c)  (Succinct  transitions)  A  transition  t  ^  (sj,  $2)  of  the  input  model  (5,  R,  Sg,  L)  is  in  the  attack  graph  G  if 
and  only  if  there  is  an  attack  in  G  that  includes  t. 

Proof: 

(a)  (=>)  Let  e  =  Sgtg...t„.]S„  be  a  (finite)  execution  of  the  input  model  such  that  is  an  unsafe  state. 

To  prove  that  e  is  an  attack  in  G,  it  is  sufficient  to  show  (1)  Sg  C  Sg’,  (2)  €  Sf,  and  (3)  for  all  d  <  k  <n,  Sk 

€  S  and  tkCRf 

Since  unsafe  holds  at  s„  and  for  all  k  there  is  a  path  from  St  to  Sn  in  the  input  model,  by  definition 
every  Sj,  along  e  s/io\aiQS  AG(-^unsafe) .  Therefore,  by  construction,  every  Sk  is  in  unsafe  and  every  4 
is  in  Rf.  (1)  and  (2),  and  (3)  follow  immediately. 

(<=)  Suppose  that  e  =  Sgtg...t„.jS„  is  an  attack  in  the  attack  graph  G.  By  construction,  all  states  and 
transitions  of  e  are  also  states  and  transitions  in  the  input  model.  Since  e  is  an  attack,  Sg  €  5/  and  s„  €  5/  . 
Therefore,  Sg  €  Sg  and  s„  €  S.  So  e  is  an  execution  of  the  input  model,  its  first  state  is  an  initial  state  of  the 
model,  and p  is  false  in  its  final  state.  It  follows  that  e  violates  the  property  A G'(“'Mns'q/e^. 

(b)  (=>)  By  construction  of  the  algorithm  in  Figure  2,  all  states  generated  for  the  attack  graph  are 
reachable  from  an  initial  state,  and  all  of  them  violate  AG(~'unsafe) .  Therefore,  for  any  such  state  ^  in  the 
input  model,  there  is  a  path  ej  from  an  initial  state  to  s,  and  there  is  a  path  e2  from  s'  to  an  unsafe  state. 

The  concatenation  of  6/  and  62  is  an  execution  e  of  the  input  model  that  violates  AG(~'unsafe) .  By 
Lemma  la,  e  is  an  attack  in  G.  Since  e  contains  s,  the  proof  is  complete. 

(<=)  If  there  is  an  attack  in  G  that  contains  s,  then  trivially  s  is  in  G. 

(c)  (=>)  By  lemma  lb,  there  is  an  attack  el  =  qgtg...Si...tm-iqm  that  contains  state  Sj  and  an  attack 
e2  =  rgUg...S2...u„.ir„  that  contains  state  S2.  So  the  following  attack  includes  both  states  Sj  and 
S2  and  the  transition  t:  e  =  q(^g...SitS2.-.Un.ir„. 

(<=)  If  there  is  an  attack  in  G  that  contains  t,  then  trivially  t  is  in  G. 


Figure  6.  Attack  graph  corresponding  to  the 
set  cover  problem. 
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B.  NP-Completeness  of  MCSA 


Given  an  attack  graph  G  =  (S,  E,  So,  L),  consider  the  problem  of  finding  a  minimum  critical  set  of 
attacks  M(sq).  We  will  call  this  problem  MCSA  or  the  minimum  critical  set  of  attacks  problem.  We  prove 
that  the  decision  version  of  the  problem  is  A'P-complete. 

Lemma  2:  Assume  that  we  are  given  an  attack  graph  G  =  (S,  E,  Sq,  L)  and  an  integer  k.  The  problem  of 
determining  whether  there  is  a  critical  set  C(so)  such  that  |  C(so)  \  <  k  is  AP-complete. 

Proof:  First,  we  prove  that  the  problem  is  in  NP.  Guess  a  set  C  with  size  <  k.  We  need  to  check  that  C 
is  a  critical  set  of  attacks. 

This  can  be  accomplished  in  polynomial  time  using  the  procedure  isCritical  (G,  C)  described 
below.  Therefore,  the  problem  is  in  NP. 

Next,  we  prove  that  the  problem  is  NP-hard.  The  reduction  is  from  the  minimum  cover  problem 
[11,  Page  222].  In  the  minimum  cover  problem  one  is  given  a  collection  C  of  subsets  of  a  finite  set  U  and  a 
positive  integer  k  <  |  C  |  .  The  problem  is  to  determine  whether  C  contains  a  cover  for  U  of  size  k  or  less, 
i.e.,  a  subset  C’  CC  with  |  C’|  <  k  such  that  every  element  of  U  belongs  to  at  least  one  member  of  C’.  We 
construct  an  attack  graph  Gc  corresponding  to  the  collection  C.  The  set  of  attacks  A  is  equal  to  C.  The 
attack  graph  Gc  has  an  initial  state  So  and  a  final  state  ^/^that  is  unsafe.  Let  U  =  {u],...uj  and  be  an 

enumeration  of  the  collection  C.  For  each  collection  c,  where  i  <  mwe  have  z  new  states  Si,l,,  ...Sic.  There  is 
an  edge  from  So  to  all  the  states  Si,l„...Sic  corresponding  to  the  collection  C;.  There  is  an  edge  from  Sq  to 
Si+ij  for  all  i  <  m-1  and  1  <j<z.  From  each  state  in  the  set  {Sm-iJ,  ■■■Am-i.z}  there  as  edge  to  the  unsafe 
state  Sf.  Label  of  the  edge  with  tail  Sq  is  c,  if  uj  €  Ci,  otherwise  the  label  is  e.  Label  of  the  edge  with  head  s^. 
ij  is  Cm  if  Uj  G  Cm  otherwise  the  label  is  e.  It  is  easy  to  prove  that  there  is  a  critical  set  of  attacks  C  such  that 
[  C I  <  A:  if  and  only  if  there  is  a  cover  of  size  less  than  or  equal  to  k. 

We  give  a  short  example  to  illustrate  the  reduction.  Consider  a  set  C/  =  {ui,U2,U3}.  Suppose  that  the 
collection  C  consists  of  the  following  subsets: 


Cl  = 

Oj  = 

Oj  = 

{^■2} 

Notice  that  there  is  a  cover  of  size  2,  i.e.,  C;  and  C2  form  a  cover.  The  attack  graph  corresponding  to  this 
problem  is  shown  in  Figure  6.  The  set  of  attacks  is  {ci,C2,C3}.  The  set  of  attacks  /c^c^/  is  critical  because 
every  path  from  So  to  the  unsafe  state  uses  at  least  one  edge  with  the  label  in  the  set  {ci.ci}- 
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Abstract 


An  attack  graph  is  a  succinct  representation  of  all  paths  through  a  system  that  end  in  a  state  where  an 
intruder  has  successfully  achieved  his  goal.  Today  Red  Teams  determine  the  vulnerability  of  networked 
systems  by  drawing  gigantic  attack  graphs  by  hand.  Constructing  attack  graphs  by  hand  is  tedious,  error- 
prone,  and  impractical  for  large  systems.  By  viewing  an  attack  as  a  violation  of  a  safety  property,  we  can 
use  model  checking  to  produce  attack  graphs  automatically:  a  successful  path  from  the  intruder’s 
viewpoint  is  a  counterexample  produced  by  the  model  checker.  In  this  paper  we  present  an  algorithm  for 
generating  attack  graphs  using  model  checking. 

Security  analysts  use  attack  graphs  for  detection,  defense,  and  forensics.  In  this  paper  we  present  a 
minimization  technique  that  allows  analysts  to  decide  which  minimal  set  of  security  measures  would 
guarantee  the  safety  of  the  system.  We  provide  a  formal  characterization  of  this  problem:  we  prove  that  it  is 
polynomially  equivalent  to  the  minimum  hitting  set  problem  and  we  present  a  greedy  algorithm  with 
provable  bounds.  We  also  present  a  reliability  technique  that  allows  analysts  to  perform  a  simple  cost- 
benefit  analysis  depending  on  the  likelihoods  of  attacks.  By  interpreting  attack  graphs  as  Markov  Decision 
Processes  we  can  use  a  standard  MDP  value  iteration  algorithm  to  compute  the  probabilities  of  intruder 
success  for  each  attack  the  graph. 

We  illustrate  our  work  in  the  context  of  a  small  example  that  includes  models  of  a  firewall  and  an  intrusion 
detection  system. 


As  networks  of  hosts  continue  to  grow,  evaluating  their  vulnerability  to  attack  becomes 
increasingly  more  important  to  automate.  When  evaluating  the  security  of  a  network,  it  is  not  enough  to 
consider  the  presence  or  absence  of  isolated  vulnerabilities.  A  large  network  builds  upon  multiple  platforms 
and  diverse  software  packages  and  supports  several  modes  of  connectivity.  Inevitably,  such  a  network  will 
contain  security  holes  that  have  escaped  notice  of  even  the  most  diligent  system  administrator. 
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Attack  (iraph 


Figure  1:  Vulnerability  Analysis  of  a  Network 


To  evaluate  the  vulnerability  of  a  network  of  hosts,  a  security  analyst  must  take  into  account  the 
effects  of  interactions  of  local  vulnerabilities  and  find  global  vulnerabilities  inboduced  by  interconnections. 
A  typical  process  for  vulnerability  analysis  of  a  network  is  shown  in  Figure  1.  First,  scanning  tools 
determine  vulnerabilities  of  individual  hosts.  Using  this  local  vulnerability  information  along  with  other 
information  about  the  network,  such  as  connectivity  between  hosts,  the  analyst  produces  an  attack  graph. 
Each  path  in  an  attack  graph  is  a  series  of  exploits,  which  we  call  atomic  attacks,  that  leads  to  an 
undesirable  state  (e.g.,  a  state  where  an  intruder  has  obtained  administrative  access  to  a  critical  host). 


1.1  Attack  Graphs  and  Intrusion  Detection 

Attack  graphs  can  serve  as  a  basis  for  detection,  defense,  and  forensic  analysis.  To  motivate  our 
study  of  the  generation  and  analysis  of  attack  graphs,  we  discuss  the  potential  applications  of  attack  graphs 
to  these  areas  of  security. 


Detection 


System  administrators  are  increasingly  deploying  intrasion  detections  systems  (IDSs)  to  detect  and 
combat  attacks  on  their  network.  Such  systems  depend  on  software  sensor  modules  that  first  detect 
suspicious  events  and  activity  and  then  issue  alerts.  Setting  up  the  sensors  involves  a  trade-off  between 
sensitivity  to  intrusions  and  the  rate  of  false  alarms  in  the  alert  stream.  When  the  sensors  are  set  to  report  all 
suspicious  events,  the  sensors  frequently  issue  alerts  for  benign  background  events.  Frequent  false  alarms 
results  in  administrators  turning  off  the  IDS  entirely.  On  the  other  hand,  decreasing  sensor  sensitivity 
reduces  their  ability  to  detect  real  attacks. 

To  address  this  trade-off,  many  intrusion  detection  systems  employ  heuristic  algorithms  to 
correlate  alerts  from  a  large  pool  of  heterogeneous  sensors.  Valdes  and  Skinner  [VSOl]  describe  a 
probabilistic  approach  to  alert  correlation.  Successful  correlation  of  multiple  alerts  increases  the  chance  that 
the  suspicious  activity  indicated  by  the  alerts  is  in  fact  malicious. 

Attack  graphs  can  enhance  both  heuristic  and  probabilistic  correlation  approaches.  Given  a  graph 
describing  all  likely  attacks  (i.e.,  sequences  of  attacker  actions),  an  IDS  can  match  individual  alerts  to 
attack  edges  in  the  graph.  Matching  successive  alerts  to  individual  paths  in  the  attack  graphs  dramatically 
increases  the  likelihood  that  the  network  is  under  attack.  This  on-line  vigilance  allows  the  IDS  to  predict 
attacker  goals,  aggregate  alarms  to  reduce  the  volume  of  alert  information  to  be  analyzed,  and  reduce  the 
false  alarm  rates.  Knowledge  of  attacker  goals  and  likely  next  steps  helps  guide  defensive  response. 

In  this  paper  we  show  how  to  generate  attack  graphs  automatically  from  models  of  the  network; 
our  models  are  expressive  enough  to  reflect  the  administrator’s  choice  of  security  policy  for  an  IDS  and  his 
choice  of  network  configuration.  Attack  graphs  enable  an  administrator  to  perform  several  kinds  of 
analyses  to  assess  their  security  needs:  marking  the  paths  in  the  attack  graph  that  an  IDS  will  detect; 
determining  where  to  position  new  IDS  components  for  best  coverage;  exploring  trade-offs  between 
different  security  policies  and  between  different  software/hardware  configurations;  and  identifying  the 
worst-case  scenarios  and  prioritizing  defense  strategy  accordingly. 
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Forensics 


After  a  break-in,  forensie  analysis  is  used  to  find  probable  attacker  actions  and  to  assess  damage. 
If  legal  action  is  desired,  analysts  seek  evidence  that  a  sequence  of  sensor  alerts  comprises  a  coherent  attack 
plan,  and  is  not  merely  a  series  of  isolated,  benign  events.  This  task  becomes  even  harder  when  the 
intmders  obfuscate  attack  steps  by  slowing  down  the  pace  of  the  attack  and  varying  specific  steps.  We  can 
construct  a  convincing  argument  as  to  the  malicious  intent  of  intmder  actions  by  matching  data  extracted 
from  IDS  logs  to  a  formal  reference  model  based  on  attack  graphs  [Ste]. 

Given  that  attack  graphs  can  be  used  to  perform  a  variety  of  analysis,  we  can  use  them  to  answer 
the  following  kinds  of  questions,  of  particular  interest  to  system  administrators: 

Question  1:  What  successful  attacks  are  undetected  by  the  IDS? 

Question  2:  If  all  measures  for  protecting  a  network  are  deployed,  does  the  system  become  safe? 

Question  3:  Given  a  set  of  measures  M,  what  is  the  smallest  subset  of  measures  M'  whose  deployment 
makes  the  system  safe? 

Answers  to  these  questions,  can  help  a  system  or  network  administrator  choose  the  best  upgrade 
strategy.  We  address  these  questions  in  Section  5. 

When  we  are  modeling  a  system  operating  in  an  unpredictable  environment,  certain  transitions  in 
the  model  represent  the  system’s  reaction  to  changes  in  the  environment.  We  can  think  of  such  transitions 
as  being  outside  of  the  system’s  control — they  occur  when  triggered  by  the  environment.  When  no 
empirical  information  is  available  about  the  relative  likelihood  of  such  environment-driven  transitions,  we 
can  model  them  only  as  nondeterministic  “choices”  made  by  the  environment.  Moreover,  for  new 
vulnerabilities  data  for  estimating  likelihoods  might  not  be  available.  However,  sometimes  empirical  data 
make  it  possible  to  assign  probabilities  to  environment-driven  transitions.  We  would  like  to  take  advantage 
of  such  quantitative  information  added  appropriately  to  attack  graphs.  In  this  context,  a  system 
administrator  might  be  interested  in  answering  the  following  question: 

Question  4:  The  deployment  of  which  security  measure(s)  will  increase  the  likelihood  of  thwarting  an 
attacker? 


The  system  administrator  can  use  the  answer  to  question  4  to  perform  a  quantitative  evaluation  of 
various  security  fixes.  We  address  this  question  in  Section  6.2. 

1.2  Our  Contributions 

Constructing  attack  graphs  is  a  crucial  part  of  performing  vulnerability  analysis  of  a  network  of 
hosts.  Currently,  Red  Teams  produce  attack  graphs  by  hand,  often  drawing  gigantic  diagrams  on  fioor-to- 
ceiling  whiteboards.  Doing  this  by  hand  is  tedious,  error-prone,  and  impractical  for  attack  graphs  larger 
than  a  hundred  nodes. 

The  main  contributions  of  our  work,  some  of  which  have  appeared  in  an  earlier  paper  [SHJ_02] 
are: 

•  We  demonstrate  how  model  checking  can  be  applied  to  generate  attack  graphs  automatically.  We 
show  that  the  attack  graphs  produced  by  our  method  are  exhaustive,  i.e.,  covering  all  possible 
attacks,  and  succinct,  i.e.,  containing  only  relevant  states  and  transitions  (see  Section  3.2). 

•  Each  state  transition  corresponds  to  a  single  atomic  attack  by  the  intruder.  A  state  in  the  model 
represents  the  state  of  the  system  between  atomic  attacks.  A  typical  transition  from  state  to  state 
S2  corresponds  to  an  atomic  attack  whose  preconditions  are  satisfied  in  Si  and  whose  effects  hold 
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in  state  S2.  An  attack  is  a  sequence  of  state  transitions  culminating  in  the  intruder  achieving  his 
goal.  The  entire  attack  graph  is  thus  a  representation  of  all  the  possible  ways  in  which  the  intruder 
can  succeed. 

•  We  prove  that  finding  a  minimum  set  of  atomic  attacks  that  must  be  removed  to  thwart  an  intruder 
is  NP-complete.  Beyond  the  proof  sketched  in  our  earlier  paper  [SHJ_02],  here  we  further  explore 
the  complexity  of  this  problem.  Section  5.2.1  proves  that  the  problem  is  polynomially  equivalent 
to  the  minimum  hitting  set  problem  where  the  collection  of  sets  is  represented  as  a  labeled  directed 
graph.  This  reduction  provided  us  with  additional  insight,  enabling  us  to  find  a  greedy  algorithm 
with  provable  bounds,  which  can  be  used  to  answer  questions  1,  2,  and  3. 

•  We  present  an  algorithm  to  compute  the  re/iahi/ity-defmed  as  the  likelihood  of  an  intruder  not 
succeeding-  of  a  networked  system.  An  advantage  of  our  algorithm  is  that  it  allows  incomplete 
information,  i.e.,  probabilities  of  all  transitions  need  not  be  provided.  To  our  knowledge,  previous 
metrics  in  the  area  of  security  require  complete  information.  We  can  use  this  algorithm  an  answer 
question  4  precisely. 

We  present  related  work  in  Section  2.  Section  3  describes  our  model  and  our  algorithm  to  generate 
attack  graphs.  We  give  details  of  an  example  networked  system  in  Section  4  and  use  it  throughout  the 
paper  for  illustrative  purposes.  In  Section  5  we  present  a  minimization  analysis  to  help  administrators 
decide  what  measures  to  deploy  to  thwart  attacks.  In  Section  6  we  present  a  reliability  analysis  over 
probabilistic  attack  graphs  based  on  the  value  iteration  algorithm  defined  for  Markov  Decision 
Processes;  this  analysis  can  help  administrators  determine  how  deployment  of  one  measure  can 
decrease  the  likelihood  of  certain  attacks.  Finally,  we  present  a  brief  summary  and  directions  for  future 
work  in  Section  7. 

2  Related  Work 

Phillips  and  Swiler  [PS98]  propose  a  concept  of  attack  graphs  similar  to  the  one  we  describe. 
Flowever,  they  model  only  attacks.  Since  we  have  a  generic  state  machine  model,  we  can  simultaneously 
model  not  just  attacks,  but  also  seemingly  benign  system  events  (e.g.,  link  failures  and  user  errors)  and 
even  system  administrator  recovery  actions.  Therefore,  our  attack  graphs  are  more  general  than  the  one 
proposed  by  Phillips  and  Swiler.  They  also  built  a  tool  for  generating  attack  graphs  [SPECOO];  it  constructs 
the  attack  graph  by  forward  exploration  starting  from  the  initial  state.  In  our  work,  we  use  a  symbolic 
model  checker  (i.e.,  NuSMV)  that  works  backward  from  the  goal  state  to  construct  the  attack  graph.  A 
major  advantage  of  the  backward  algorithm  is  that  vulnerabilities  that  are  not  relevant  to  the  safety  property 
(or  the  goal  of  the  intruder)  are  never  explored;  this  technique  can  result  in  significant  savings  in  space.  In 
fact,  Swiler  et  al.  [SPECOO]  refer  to  the  advantages  of  the  backward  search  in  their  paper.  Finally,  the  post- 
facto  analysis  suggested  by  Phillips  and  Swiler  is  also  different  from  the  ones  we  present  in  this  paper.  We 
plan  to  incorporate  their  analysis  into  our  tool  suite. 

Dacier  [Dac94]  proposes  the  concept  of  privilege  graphs,  where  each  node  represents  a  set  of 
privileges  owned  by  the  user  and  arcs  represent  vulnerabilities.  Privilege  graphs  are  then  explored  to 
construct  attack  state  graphs,  which  represent  different  ways  in  which  an  intruder  can  reach  a  certain  goal, 
such  as  root  access  on  a  host.  Dacier  proposes  a  metric,  called  the  mean  effort  to  failure  or  METF,  based  on 
the  attack  state  graphs.  Orlato  et  al.  [ODK99]  describe  an  experimental  evaluation  of  this  framework.  At 
the  surface  our  notion  of  attack  graphs  seems  similar  to  Dacier’ s.  Flowever,  as  in  the  case  with  Phillips  and 
Swiler,  Dacier  takes  an  “attack-centric”  view  of  the  world;  again,  our  attack  graphs  are  more  general.  From 
the  experiments  conducted  by  Orlato  et  al.  it  appears  that  even  for  small  examples  the  space  required  to 
construct  attack  state  graphs  becomes  prohibitive.  Model  checking  has  made  significant  advances  in 
representing  large  state  spaces.  Therefore,  by  basing  our  algorithm  on  model  checking  we  leverage  off 
those  advances  and  can  hope  to  represent  large  attack  graphs. 

Ritchey  and  Amman  [RAOl]  also  used  model  checking  for  vulnerability  analysis  of  networks. 
They  used  the  unmodified  model  checker  SMV  [SMV].  Therefore,  they  could  only  obtain  one  counter¬ 
example  or  one  attack  corresponding  to  a  intrader’s  goal.  In  contrast,  we  modified  the  model  checker 


21 


NuSMV  to  produce  complete  attack  graphs,  which  represents  all  possible  attacks.  We  also  described 
analyses  that  can  be  performed  on  these  attack  graphs.  These  analyses  cannot  be  meaningfully  performed 
on  single  attacks. 

3  Generating  Attack  Graphs  nsing  Model  Checking 

First,  we  formally  define  attack  graphs,  the  data  structure  used  to  represent  all  possible  attacks  on 
our  networked  system.  We  restrict  our  attention  to  attack  graphs  representing  violations  of  safety 
properties'. 

Definition  1  Let  AP  be  a  set  of  atomic  propositions.  An  attack  graph  or  AG  is  a  tuple  G  =  (5,  r.  So,  Ss,  L), 
where  5  is  a  set  of  states,  t  CS  x  S  is  a  transition  relation,  SgCS  is  a  set  of  initial  states,  Ss  CS  is  a  set  of 
success  states,  and  L:  5  — >  2^^  is  a  labeling  of  states  with  a  set  of  propositions  true  in  that  state. 

Unless  stated  otherwise,  we  assume  that  the  transition  relation  x  is  total.  We  define  an  execution 
fragment  as  a  finite  sequence  of  states  SoSi...s„  such  that  (Sj,  Sj+i)  €  z  for  all  0  <i  <  n.  An  execution  fragment 
with  So  €  So  is  an  execution,  and  an  execution  whose  final  state  is  in  is  an  attack,  i.e.,  the  execution 
corresponds  to  a  sequence  of  atomic  attacks  leading  to  the  intrader’s  goal  state.  Intuitively,  denotes  all 
states  where  the  intruder  has  achieved  his  goal,  e.g.,  obtaining  root  access  on  a  critical  host. 

Next  we  turn  our  attention  to  algorithms  for  automatic  generation  of  attack  graphs  and  properties 
that  we  can  guarantee  of  them.  Starting  with  a  description  of  a  network  model  M  and  a  security  property  p, 
the  task  is  to  construct  an  attack  graph  representing  all  executions  of  M  that  violate  p.  These  are  the 
successful  attacks.  For  the  kinds  of  attack  graph  analyses  suggested  in  Section  1,  it  is  essential  that  the 
graphs  produced  by  the  algorithms  be  exhaustive  and  succinct.  An  attack  graph  is  exhaustive  with  respect 
to  a  model  M  and  correctness  property  p  if  it  covers  all  possible  attacks  in  M  leading  to  a  violation  of p,  and 
succinct  if  it  only  contains  those  states  and  transitions  of  M  that  lead  to  a  state  violating  p. 

3.1  Reachability  Analysis 

If  we  restrict  ourselves  to  safety  properties,  an  attack  graph  may  be  constructed  by  performing  a 
simple  statespace  search.  Starting  with  the  initial  states  of  the  model  M,  we  use  a  graph  traversal  procedure 
(e.g.,  depth  first  search)  to  find  all  reachable  success  states  where  the  safety  property  p  is  violated.  The 
attack  graph  is  the  union  of  all  paths  from  initial  states  to  success  states. 

While  this  algorithm  has  the  advantage  of  simplicity,  it  handles  only  safety  properties  and  may  ran 
into  the  state  explosion  problem  for  non-trivial  models.  Model  'checking  has  dealt  with  both  of  these  issues 
with  some  success,  so  we  will  consider  algorithms  based  on  that  technology. 

3.2  Model  Checking  Algorithm 

Model  checking  is  a  technique  for  checking  whether  a  formal  model  M  of  a  system  satisfies  a 
given  property  p.  In  our  work,  we  use  the  model  checker  NuSMV  [NuS],  for  which  the  model  M  is  a  finite 
labeled  transition  system  and  p  is  a  property  expressed  in  Computation  Tree  Logic  (CTL).  For  now,  we 
consider  only  safety  properties,  which  in  CTL  have  the  form  AGf  {i.e.,  p  =  AGf  where /is  a  formula  in 
propositional  logic).  If  the  model  M  satisfies  the  property  p,  NuSMV  reports  “true.”  If  M  does  not  satisfy  p, 
NuSMV  produces  a  counter-example.  A  single  counter-example  shows  an  execution  that  leads  to  a 
violation  of  the  property.  In  this  section,  we  explain  how  to  construct  attack  graphs  for  safety  properties 
using  model  checking. 


'  We  say  more  on  liveness  properties  in  Section  7. 
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Figure  2:  Algorithm  for  Generating  Attack  Graphs 

Attack  graphs  depict  ways  in  which  the  system  can  reach  an  unsafe  state  (or,  equivalently,  a 
successful  state  for  the  intruder).  We  can  express  the  property  that  an  unsafe  state  cannot  be  reached  as: 

AG(^unsafe) 

When  this  property  is  false,  there  are  unsafe  states  that  are  reachable  from  the  initial  state.  The 
precise  meaning  of  unsafe  depends  on  the  application.  For  example,  in  the  network  security  example  given 
in  Section  4,  the  property  given  below  is  used  to  express  that  the  privilege  level  of  the  intruder  on  the  host 
with  index  2  should  always  be  less  than  the  root  (administrative)  privilege. 

AG(network.adversary.privilege[2]  <  network.priv.root) 

We  briefly  describe  the  algorithm  (see  Figure  2)  for  constructing  attack  graphs  for  the  property 
AG(^unsafe).  The  first  step  is  to  determine  the  set  of  states  S'rthat  are  reachable  from  the  initial  state.  Next, 
the  algorithm  computes  the  set  of  reachable  states  Su„safe  that  have  a  path  to  an  unsafe  state.  The  set  of  states 
Sunsqfe  is  computed  using  an  iterative  algorithm  derived  from  a  fix-point  characterization  of  the  AG  operator 
[CGPOO].  Let  R  be  the  transition  relation  of  the  model,  i.e.,  (s,s  ’)  €  Rif  and  only  if  there  is  a  transition  from 
state  ^  to  ^  By  restricting  the  domain  and  range  of  R  to  Su„sqfe  we  obtain  a  transition  relation  R^  that 
represents  the  edges  of  the  attack  graph.  Therefore,  the  attack  graph  is  (Sumafe,  R’’,  S^o,  Sf,  L),  where  Sunsa/e 
and  R^  represent  the  set  of  nodes  and  edges  of  the  graph  respectively;  So  H  Su„safe  is  the  set  of  initial 
states;  and  5"/^  =  |  s'  G  Su„sqfe  ''s  =  unsafe}  is  the  set  of  success  states. 

In  symbolic  model  checkers,  such  as  NuSMV,  the  transition  relation  and  sets  of  states  are 
represented  using  BDDs  [Bry86],  a  compact  representation  for  boolean  functions.  There  are  efficient  BDD 
algorithms  for  all  operations  used  in  the  algorithm  shown  in  Figure  2. 

3.3  Attack  Graph  Properties 

We  can  show  that  an  attack  graph  G  generated  by  the  algorithm  in  Figure  2  is  exhaustive  (Lemma 
1(a))  and  succinct  with  respect  to  states  and  transitions  (Lemmas  1(b)  and  1(c)). 

Lemma  1 

(a)  (Exhaustive)  An  execution  e  of  the  input  model  (S,  R,  So,  L)  violates  the  property  p  =  AG(^unsafe)  if 
and  only  if  e  is  an  attack  in  the  attack  graph  G  =  (Sunsafe,  R'’,  Sf,  Sf). 
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(b)  (Succinct  states)  A  state  ^  of  the  input  model  ( S,  R,  So,  L)  is  in  the  attack  graph  G  if  and  only  if  there 
is  an  attack  in  G  that  contains  s. 

(c)  (Succinct  transitions)  A  transition  t  =  (si,  s2)  of  the  input  model  (S,  R,  So,  L)  is  in  the  attack  graph  G 
if  and  only  if  there  is  an  attack  in  G  that  includes  t. 

Proof: 

(a)  exhaustive.  (=>)  Let  e  =  Soto---t„.jS„  be  a  (finite)  execution  of  the  input  model  such  that  s„  is  an 
unsafe  state.  To  prove  that  e  is  an  attack  in  G,  it  is  sufficient  to  show  (1)  So  C  So’,  (2)  s„  C  Sf,  and  (3)  for  all 
0<k  <n,SteS  and  tkCRf 

Since  unsafe  holds  at  s„  and  for  all  k  there  is  a  path  from  s^  to  s„  in  the  input  model,  by  definition 
every  Sk  along  e  violates  AGGunsafe) .  Therefore,  by  construction,  every  St  is  in  Su„safe  and  every  4  is  in 
R'’.  (1)  and  (2),  and  (3)  follow  immediately. 

(<=)  Suppose  that  e  =  Soto.--t„-iS„  is  an  attack  in  the  attack  graph  G.  By  construction,  all  states  and 
transitions  of  e  are  also  states  and  transitions  in  the  input  model.  Since  e  is  an  attack,  So  €  5/  and  s„  €  Sf  . 
Therefore,  Sg  €  Sg  and  s„  €  S.  So  e  is  an  execution  of  the  input  model,  its  first  state  is  an  initial  state  of  the 
model,  and p  is  false  in  its  final  state.  It  follows  that  e  violates  the  property  AG(~'unsafe). 

(b)  succinct  state  (=>)  By  construction  of  the  algorithm  in  Figure  2,  all  states  generated  for  the 
attack  graph  are  reachable  from  an  initial  state,  and  all  of  them  violate  AG(~'unsafe) .  Therefore,  for  any 
such  state  s'  in  the  input  model,  there  is  a  path  gy  from  an  initial  state  to  s,  and  there  is  a  path  e2  from  s  to  an 
unsafe  state. 

The  concatenation  of  g;  and  g2is  an  execution  g  of  the  input  model  that  violates  AG(~'unsafe) .  By 
Lemma  la,  g  is  an  attack  in  G.  Since  e  contains  s,  the  proof  is  complete. 

(<=)  If  there  is  an  attack  in  G  that  contains  s,  then  trivially  s  is  in  G. 

(C)  Succinct-transition.  (=>)  By  lemma  lb,  there  is  an  attack  g;  =  qgto...Si...tn-iqm  that  contains 
state  S;  and  an  attack  e2  =  roUo...S2...u„.ir„  that  contains  state  S2.  So  the  following  attack 
includes  both  states  S/  and  S2  and  the  transition  t:  e  =  qoto...SitS2.:U„.ir„. 

(<=)  If  there  is  an  attack  in  G  that  contains  t,  then  trivially  t  is  in  G. 

4  A  Simple  Intrusion  Detection  Example 

Consider  the  example  network  shown  in  Figure  3.  There  are  two  target  hosts,  ipi  and  ip 2,  and  a 
firewall  separating  them  from  the  rest  of  the  Internet.  As  shown,  each  host  is  running  two  of  three  possible 
services  (ftp,  sshd,  a  database).  An  intrusion  detection  system  (IDS)  monitors  the  network  traffic  between 
the  target  hosts  and  the  outside  world.  There  are  four  possible  atomic  attacks,  identified  numerically  as 
follows:  (0)  sshd  buffer  overflow,  (1)  ftp  .rhosts,  (2)  remote  login,  and  (3)  local  buffer  overflow.  If  an 
atomic  attack  is  detectable,  the  intrusion  detection  system  will  trigger  an  alarm;  if  an  attack  is  stealthy,  the 
IDS  misses  it.  The  ftp  .rhosts  attack  needs  to  find  the  target  host  with  two  vulnerabilities:  a  writable  home 
directory  and  an  executable  command  shell  assigned  to  the  ftp  user  name.  The  local  buffer  overflow 
exploits  a  vulnerable  version  of  the  xterm  executable. 

In  this  section,  we  construct  a  finite  state  model  of  the  example  network  so  that  each  state 
transition  corresponds  to  a  single  atomic  attack  by  the  intruder.  A  state  in  the  model  represents  the  state  of 
the  system  between  atomic  attacks.  A  typical  transition  from  state  to  state  S2  corresponds  to  an  atomic 
attack  whose  preconditions  are  satisfied  in  ^y  and  whose  effects  hold  in  state  S2. 

The  intrader  launches  his  attack  starting  from  a  single  computer,  ip^,  which  lies  outside  the 
firewall.  Flis  eventual  goal  is  to  disrupt  the  functioning  of  the  database.  For  which,  the  intruder  needs  root 
access  on  the  database  host  ip2. 
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Figure  3:  Example  Network 


4.1  States  of  the  Finite  State  Machine  Model 
The  Network 

We  model  the  network  as  a  set  of  faets,  eaeh  represented  as  a  relational  predieate.  The  state  of  the 
network  speeifies  serviees,  host  vulnerabilities,  eonneetivity,  and  a  remote  login  trust  relationship  between 
hosts.  There  are  six  boolean  variables  for  eaeh  host,  speeifying  whether  any  of  the  three  modeled  serviees 
are  running  and  whether  any  vulnerabilities  are  present  on  that  host. 


variable 

meaning 

ssh/, 

ssh  sorv  ice  is  running  on  host  Jt 

t'lPA 

lip  service  is  miming  on  host  h 

elala/t 

database  is  running  on  host  h 

wdir/t 

tip  home  director)'  is  writable  on  host  h 

fsholl/t 

lip  user  has  execuiabic  shell  on  host  h 

.xlorm/, 

xterm  cxcciilahle  is  vulnerable  to  overllow  on  host  h 

Connectivity  is  expressed  as  a  ternary  relation  R  C  Host  x  Host  x  Port,  where  R(h j,  h2,  p)  means 
that  host  h2  is  reachable  from  host  h /  on  port  p.  The  constants  sp  and  fp  will  refer  to  the  specific  ports  for 
the  ssh  and  ftp  services,  respectively.  Slightly  abusing  notation  (by  overloading  R),  we  write  R(h i,  h2)  when 
there  is  a  network  route  from  /i/  to  We  model  trust  as  a  binary  relation  RshTrust  C  Host  x  Host,  where 
RshTrust(hi,  h2)  indicates  that  a  user  may  log  in  from  host  to  host  hi  without  authentication  (i.e.,  host  hi 
“tmsts”  host  hi). 

The  Intruder 

The  function  plvl^,:  Hosts  —>’{none,  user,  root}  gives  the  level  of  privilege  that  intruder  A  has  on 
each  host.  There  is  a  total  order  on  the  privilege  levels:  none  <  user  <  root. 
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Several  state  variables  speeify  whieh  attaek  the  intruder  will  attempt  next: 


variable 

meaning 

allack 

allack  type 

source 

source  host 

1  argot 

laritel  host 

strain 

stealthy  delectable  allack 

If  deteetable,  it  will  trigger  an  alarm  when  exeeuted  on  a  host  or  network  segment  monitored  by 
the  IDS;  if  an  attaek  is  stealthy,  the  IDS  does  not  deteet  it. 

We  speeify  the  IDS  with  a  funetion  ids:  Host  x  Host  x  Attack  — >  {d,  s,  b},  where  ids(hi,  h2,  a)  ^  d 
if  attaek  a  is  (ieteetable  when  exeeuted  with  souree  host  Ay  and  target  host  h2',  ids  (hi,  h2,  a)  =  s  '\i  attaek  a  is 
stealthy  when  exeeuted  with  souree  host  hi  and  target  host  hy,  and  ids(hi,  h2,  a)  ^  b  if  attaek  a  has  both 
deteetable  and  stealthy  strains,  and  sueeess  in  deteeting  the  attaek  depends  on  whieh  strain  is  used.  When 
hi  and  refer  to  the  same  host,  ids(hi,  h2,  a)  speeifies  the  intrusion  deteetion  system  eomponent  (if  any) 
loeated  on  that  host.  When  hi  and  refer  to  different  hosts,  ids  ids(hi,  h2,  a)  speeifies  the  intrusion 
deteetion  system  eomponent  (if  any)  monitoring  the  network  path  between  hi  and  /?2.  In  addition,  a  global 
boolean  variable  speeifies  whether  the  IDS  alarm  has  been  triggered  by  any  previously  exeeuted  atomie 
attaek. 

4.2  Initial  States 

Initially,  there  is  no  trust  between  any  of  the  hosts;  the  trust  relation  Tr  is  empty.  The  eonneetivity 
relation  R  is  shown  in  the  following  table.  An  entry  in  the  table  eorresponds  to  a  pair  of  hosts  (hi,  h2).  Eaeh 
entry  is  a  triple  of  boolean  values.  The  first  value  is  ‘y’  if  hi  and  /?2  are  eonneeted  by  a  physieal  link,  the 
seeond  value  is  ‘y’  if  hi  ean  eonneet  to  /?2  on  the  ftp  port,  and  the  third  value  is  ‘y’  if  hi  ean  eonneet  to  /?2  on 
the  sshd  port. 


nn 

B9 

1^9 

IRl 

TO 

RSI 

RBI 

We  use  the  eonneetivity  relation  to  refleet  the  firewall  rule  sets  as  well  as  the  existenee  of  physieal 
links.  For  the  table  above,  the  firewall  is  open  and  does  not  plaee  any  restrietions  on  the  flow  of  network 
traffie. 


Initially,  the  intruder  has  root  privileges  on  his  own  maehine  ipa  and  no  privileges  on  the  other 

hosts. 


The  paths  between  {ip a,  ipi)  and  between  {ip a,  ip2)  are  monitored  by  the  single  network-based  IDS. 
The  path  between  {ipi,  ip 2)  is  not  monitored.  There  are  no  other  host-based  intrusion  deteetion  eomponents. 
The  IDS  deteets  the  remote  login  attaek  and  the  deteetable  strains  of  the  sshd  buffer  overflow  attaek. 

4.3  Transitions 

Our  model  has  nondeterministie  state  transitions.  If  the  eurrent  state  of  the  network  satisfies  the 
preconditions  of  more  than  one  atomie  attaek  rule,  the  intruder  nondeterministieally  “ehooses”  one  of 
those  attaeks.  The  state  then  ehanges  aeeording  to  the  effects  clause  of  the  chosen  attack  rule.  The  intmder 
repeats  this  process  until  his  goal  is  achieved. 
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We  model  four  atomic  attacks.  Throughout  the  description,  S  is  used  to  designate  the  source  host 
and  T  the  target  host.  Recall  that  R(S,  T,  p)  denotes  that  host  T  is  reachable  from  host  S  on  port  p. 

Sshd  Buffer  Overflow 

This  remote-to-root  attack  immediately  gives  a  remote  user  a  root  shell  on  the  target  machine. 


intruder  preconditions 

>  user 

Uscr-li’vel privilc^it’s  on  host  S 

]^  vJa{T)  <  root 

Xo  ivot-level privilci'cs  on  host  T 

network  preconditions 

sslij" 

Host  Tis  ninnifiiisslul 

Host  T  is  ivacluihle  from  on  port  sp 

intruder  effects 

=  root 

Root-level  privileges  on  host  T 

netwoi  k  effects 

— isshr 

Host  T  is  not  running  sslnl 

end 

Ftp  .rhosts 

Using  an  tp  vulnerability,  the  intruder  creates 

an  .rhosts  file  in  the  ftp  home  directory,  creating  a 

remote  login  trust  relationship  between  his  machine  and  the  target  machine. 

iittack  ftp-rhosts  is 

intruder  preconditions 

>  user 

User-level privileiies  on  hosi 

network  preconditions 

t'tpr 

Host  Tis  runnirii’ltp 

JUS, T.  Ip) 

Host  T  is  ivaclicihle  Jivni  on  port  fp 

wdirr 

Ftp  (liiectorv  writuhie  on  host  T 

fslicl  It 

Ftp  user  has  been  assi^neil  a  valid  shell  on  host  T 

^\.-iRsliTrmt(X,T) 

So  rsh  trust  for  some  host  X  and  T 

intruder  effects 

none 

network  effects 

VX.RsliTimt{,X.  T) 

Rsh  trust  between  all  hosts  and  T 

end 

Remote  Login 

Using  an  existing  remote  login  trust  relationship  between  two  machines,  the  intruder  logs  in  from 
one  machine  to  another,  getting  a  user  shell  without  supplying  a  password.  This  operation  is  usually  a 
legitimate  action  performed  by  regular  users,  but  from  the  intrader’s  viewpoint,  it  is  an  atomic  attack. 
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iittiK'k  rsh-login  is 

intruder  preconditions 

=  user 

User-k’vcl privileges  on  host  i?' 

=  iioiio 

.\-o  privileges  on  host  T 

network  preconditions 

IbhTrustCi^.T) 

Rsh  trust  between  .S’  and  T 

Host  T  is  reachable  from  .S' 

intruder  effects 

Local  Buffer  Overflow 

If  the  intruder  has  acquired  a  user  shell  on  the  target  machine,  the  next  step  is  to  exploit  a  buffer 
overflow  vulnerability  on  a  setuid  root  fde  to  gain  root  access. 


iittack  local-scltiid-lnilVcr-ovornow  is 

intruder  preconditions 
jflvJAiT)  =  user 
network  preconditions 

User-level privile^ies  on  host  'T 

.xlerniT- 

Them  is  a  vulnerable  xterni  executable 

intruder  effects 

]^vJa(T)  =  root 

Root-level  privileges  on  host  T 

network  effects 

none 

end 

It  is  easy  to  see  that  each  atomic  attack  strictly  increases  either  the  intruder’s  privilege  level  on  the 
target  host  or  remote  login  trust  between  hosts.  This  means  that  the  attack  graph  has  no  cycles. 

From  our  finite  model  we  can  now  automatically  construct  attack  graphs  that  demonstrate  how  the 
intruder  can  violate  various  security  properties.  Suppose  we  want  to  generate  all  attacks  that  demonstrate 
how  the  intruder  can  gain  root  privilege  on  host  ip2  and  remain  undetected  by  the  IDS.  The  following  CTL 
formula  expresses  the  safety  property  that  the  intruder  on  host  ip2  always  has  privilege  level  below  root  or 
is  detected'. 


AG(network.adversary.privilege[2]  <  network.priv.root  \  network.detected) 

Figure  4  shows  the  attack  graph  produced  by  our  tool  for  this  property.  Each  node  is  labeled  by  an 
attack  id  number,  which  corresponds  to  the  atomic  attack  to  be  attempted  next,  a  flag  S/D  indicates  whether 
the  attack  is  stealthy  or  detectable  by  the  intrusion  detection  system;  and  the  numbers  of  the  source  and 
target  hosts  {ipa  corresponds  to  host  number  0). 

Any  path  in  the  graph  from  the  root  node  to  a  leaf  node  shows  a  sequence  of  atomic  attacks  that 
the  intruder  can  employ  to  achieve  his  goal  while  remaining  undetected.  For  instance,  the  path  highlighted 
by  dashed-boxed  nodes  consists  of  the  following  sequence  of  four  atomic  attacks:  overflow  sshd  buffer  on 
host  1,  overwrite  .rhosts  fde  on  host  2  to  establish  rsh  trust  between  hosts  1  and  2,  log  in  using  rsh  from 
host  1  to  host  2,  and  finally,  overflow  a  local  buffer  on  host  2  to  obtain  root  privileges. 

We  have  also  expanded  the  example  described  above  by  adding  two  additional  hosts,  four 
additional  atomic  attacks,  several  new  vulnerabilities,  and  flexible  firewall  configurations.  For  this  larger 
example  the  attack  graph  has  5948  nodes  and  68364  edges. 
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5  Minimization  Analysis 

Once  we  have  an  attack  graph  generated  for  a  specific  network  with  respect  to  a  given  safety 
property,  we  can  utilize  it  for  further  analysis.  A  system  administrator  has  available  to  him  a  set  of 
measures,  such  as  deploying  additional  intrusion  detection  tools,  adding  firewalls,  upgrading  software, 
deleting  user  accounts. 
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Figure  4:  Attack  Graph 


Figure  5:  Attack  Graph  Analysis 


Minimization  analysis  helps  analysts  make  decisions  about  what  measures  to  deploy  depending  on 
what  set  of  atomic  attacks  they  thwart.  It  helps  us  answer  questions  such  as  1,2,  and  3  posed  in  Section  1.1. 
Let  us  look  at  each  question  in  turn  since  they  suggest  different  solution  approaches. 

5.1  Minimal  Subsets  of  Atomic  Attacks  to  Thwart 

Suppose  we  want  to  find  a  minimal  set.  A,  of  atomic  attacks  that  must  be  prevented  to  guarantee 
the  adversary  cannot  achieve  his  goal.  A  system  analyst  can  use  this  information  in  deciding  to  choose  one 
measure  m /,  which  eliminates  this  minimal  set  of  attacks  over  another  measure,  m2,  perhaps  cheaper  than 
mi,  but  ineffective  with  respect  to  A. 

A  naive  solution  is  as  follows: 
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1 .  Make  only  a  subset  of  the  atomic  attacks  available  to  the  intruder. 

2.  Run  the  model  checking  algorithm  to  determine  if  the  adversary  can  succeed. 

3.  Do  Steps  1  and  2  for  all  possible  non-empty  subsets  of  atomic  attacks. 

Clearly  this  solution  is  exponential  in  the  number  of  atomic  attacks.  For  our  example,  however, 
the  number  is  small,  and  we  can  easily  determine  this  minimal  set.  As  a  by-product  of  determining  this  set, 
we  can  easily  answer  the  first  question  posed  in  Section  1 . 

Question  1:  What  successful  attacks  are  undetected  by  the  IDS? 

Answer:  To  answer  this  question,  we  modify  the  model  slightly.  For  simplicity,  we  nondeterministically 
decide  which  subset  to  consider  initially,  before  any  attack  begins;  once  the  choice  is  made,  the  subset  of 
available  atomic  attacks  remains  constant  during  any  given  attack.  We  ran  the  model  checker  on  the 
modified  model  with  the  invariant  property  that  says  the  intruder  never  obtains  root  privilege  on  host  ip2. 

AG{network.adversary.privilege[2]  <  network.priv.root) 

The  post-processor  marked  the  states  where  the  intruder  has  been  detected  by  the  IDS.  The  result 
is  shown  in  Figure  5.  The  white  rectangles  indicate  states  where  the  attacker  had  not  yet  been  detected  by 
the  intrusion  detection  system.  The  black  rectangles  are  states  where  the  intrusion  detection  system  has 
sounded  an  alarm.  Thus,  white  leaf  nodes  are  desirable  for  the  attacker  because  his  objective  is  achieved 
without  detection.  Black  leaf  nodes  are  less  desirable.  The  attacker  achieves  his  objective,  but  the  alarm 
goes  off 


The  resolution  of  which  atomic  attacks  are  available  to  the  intmder  happens  in  the  circular  nodes 
near  the  root  of  the  graph.  The  first  transition  out  of  the  root  (initial)  state  picks  the  subset  of  attacks  that 
the  intruder  will  use.  Each  child  of  the  root  node  is  itself  the  root  of  a  disjoint  subgraph  where  the  subset  of 
atomic  attacks  chosen  for  that  child  is  used.  Note  that  the  number  of  such  subgraphs  descending  from  the 
root  node  corresponds  to  the  number  of  subsets  of  atomic  attacks  with  which  the  intrader  can  be  successful. 
The  model  checker  determines  that  for  any  other  possible  subset,  there  is  no  possible  successful  sequence 
of  atomic  attacks. 

The  root  of  the  graph  in  Figure  5  has  two  subgraphs,  corresponding  to  the  two  subsets  of  atomic 
attacks  that  will  allow  the  intruder  to  succeed.  In  the  left  subgraph  the  sshd  buffer  overflow  attack  is  not 
available  to  the  intruder;  it  can  be  readily  seen  that  the  intruder  can  still  succeed,  but  cannot  do  so  while 
remaining  undetected  by  the  IDS.  In  the  right  subgraph,  all  attacks  are  available.  Thus,  the  entire  attack 
graph  implies  that  all  atomic  attacks  other  than  the  sshd  attack  are  indispensable:  the  intruder  cannot 
succeed  without  them.  That  is,  for  no  other  subset  of  atomic  attacks  can  the  intruder  succeed  in  achieving 
his  goal.  The  analyst  can  use  this  information  to  guide  decisions  on  which  network  defenses  can  be 
profitably  upgraded. 

The  white  cluster  in  the  middle  of  the  figure  is  isomorphic  to  the  attack  graph  presented  in  Figure 
4;  it  shows  attacks  in  which  the  intruder  can  achieve  his  objective  without  detection  (i.e.,  all  paths  by  which 
the  intruder  reaches  a  white  leaf  in  the  graph). 


AG(^unsafe) 

Let  A  be  the  set  of  atomic  attacks,  and  G  =  (S,  E,  So,Ss,  L)  be  the  attack  graph,  where  S  is  the  set  of 
states,  ii  C  5  V  5  is  the  set  of  edges,  SqC  5  is  the  initial  state,  G  5  is  the  success  state  for  the  intruder,  and 
L:  E  ^  A  U  {s}  is  a  labeling  function  where  L(e)=  a  if  an  edge  e  =(s  s’)  corresponds  to  an  atomic 

attack  a,  otherwise  L(e)=e.  Edges  labeled  with  e  represent  system  transitions  that  do  not  correspond  to  an 
atomic  attack.  Moreover,  as  demonstrated  below  additional  e  edges  can  be  also  introduced  by  our 
construction.  Without  loss  of  generality  we  can  assume  that  there  is  a  single  initial  and  success  state.  For 
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example,  eonsider  an  attaek  graph  with  multiple  initial  states  So’ ■■■  sj  and  sueeess  states  ...  sj.  We  ean 
add  a  new  initial  state  Sg  and  a  new  sueeess  state  s'iWith  e- labeled  edges  (sg,  Sg"' )  (1  <m<j)  and  sj)(l<  t 
<  u). 

Suppose  we  are  also  given  a  finite  set  of  measures  M  ^  (rrii,...  ,mk)  and  a  fimetion  covers:  2^. 

An  atomie  attaek  a  G  covers(m^  if  adopting  measure  m,  removes  the  atomie  attaek  a. 

We  are  now  ready  to  address  the  question  of  what  measures  a  system  administrator  should  deploy 
to  ensure  the  system  is  safe.  Again,  there  is  a  naive  solution,  that  is,  to  try  all  possible  subsets  of  measures 
M’  C  M  and  determine  whieh  of  those  make  the  system  safe.  We  diseuss  this  approaeh  in  the  eontext  of 
question  2: 

Question  2:  If  all  measures  for  proteeting  a  network  are  deployed,  does  the  system  beeome  safe? 

Answer:  A  network  administrator  wants  to  find  out  whether  adopting  measures  from  a  set  M’CM  will 
make  the  network  safe.  This  question  ean  be  answered  in  linear  time  using  the  attaek  graph  G.  First,  we 
define  covers(M’)  as  U„cm'  covers(m).  Next,  we  remove  all  edges  e  from  G  sueh  that  L(e)  €  covers(M’). 
The  network  is  safe  iff  the  sueeess  state  is  not  reaehable  from  the  initial  state  Sg.  This  simple  reaehability 
question  ean  be  answered  in  time  that  is  linear  in  the  size  of  the  graph. 

As  the  set  of  measures  grows  (and  as  the  set  of  atomie  attaeks  grows),  we  really  would  like  to  have 
the  system  administrator  ehoose  the  smallest  subset  of  measures  that  would  guarantee  the  networked 
system  is  safe.  We  address  this  deeision  in  the  eontext  of  question  3: 

Question  3:  Given  a  set  of  measures  M,  what  is  the  smallest  subset  of  measures  M’  whose  deployment 
makes  the  system  safe? 

Answer:  A  network  administrator  wishes  to  find  a  subset  M’  CM  of  smallest  size,  sueh  that  adopting  the 
measures  in  the  set  M’  will  make  the  network  safe.  Unfortunately,  this  problem  is  AP-eomplete,  but  we 
develop  good  approximation  algorithms.  We  proeeed  in  two  steps: 

Step  1:  Finding  a  small  set  of  atomic  attacks. 

In  this  step,  we  find  a  set  of  atomie  attaeks  whose  removal  makes  the  network  safe.  As  deseribed 
in  the  previous  seetion,  eheeking  every  possible  subset  of  attaeks  is  exponential  in  the  number  of 
attaeks.  In  an  earlier  eonferenee  paper  [SHJ_02],  we  show  that  finding  the  minimum  set  of  atomie 
attaeks  whieh  must  be  removed  to  thwart  an  intruder  is  in  faet  AP-eompIete.  We  repeat  part  of  the 
proof  below  (see  Lemma  2).  We  also  demonstrated  how  a  minimal  set  ean  be  found  in 
polynomial-time.^  In  this  paper,  we  further  explore  the  eomplexity  of  this  problem.  Seetion  5.2.1 
proves  that  the  problem  of  finding  a  minimum  set  of  attaeks  is  polynomially  equivalent  to  the 
minimum  hitting  set  problem,  where  the  eolleetion  of  sets  is  represented  as  labeled  direeted  graph. 
This  reduetion  provided  us  with  additional  insight.  This  additional  insight  enabled  us  to  find  a 
greedy  algorithm  with  provable  bounds.  Reeall  that  M  =  (mi,...  ,m0  is  the  set  of  measures  and 
covers:  M  — »■  2'^  is  a  fimetion,  where  covers(mi)  represents  the  set  of  atomie  attaeks  that  are 
removed  by  adopting  the  measure  m,.  With  eaeh  attaek  a  in  the  set  A  we  assoeiate  a  set  of 

measures  M(a)  whieh  is  {mi  \  a  €  covers(m(}) .  The  set  of  attaeks  A  ’  defines  a  eolleetion  Ca  -  of 

subsets  of  M.  We  wish  to  find  the  smallest  subset  M’  C  M  sueh  that  for  all  a  €  A’  there  exists  an 
m,  €  M’  sueh  that  a  €  covers(mi),  or  equivalently  M’  fl  M(a)  0.  This  is  known  as  the  minimum 
hitting  set  problem,  whieh  is  AP-eomplete,  but  good  approximation  algorithms  exist  to  solve  this 
problem  (see  Seetion  5.2.2) 

5.2.1  The  Minimum  Critical  Attack  Sets  and  the  Minimum  Hitting  Set  Prohlem 

This  section  addresses  the  first  step  in  the  answer  to  question  3.  Assume  that  we  are  given  an 
attack  graph  G  =  (S,  E,  sg,  s^,  L),  where  S  is  the  set  of  states,  is  C  5  v  5  is  the  set  of  edges,  C  5  is  the  initial 

state,  SsC  S  is  the  success  state  for  the  intruder,  and  L  :  E  -^A  U  {e}  is  a  labeling  function. 
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Given  a  state  s  €  S,  a  set  of  attaeks  C  is  critical  with  respeet  to  ^  if  and  only  if  the  intruder  eannot 
reach  his  goal  from  ^  when  the  attacks  in  C  are  removed  from  his  arsenal.  Equivalently,  C  is  critical  with 
respect  to  ^  if  and  only  if  every  path  from  s'  to  the  success  state  has  at  least  one  edge  labeled  with  an 
attack  a  €  C. 

A  critical  set  corresponding  to  a  state  s  is  minimum  (denoted  M(s))  if  there  is  no  critical  set  M’(s) 
such  that  I  M’(s)  |  <  I  M(s)  I  .  In  general,  there  can  be  multiple  minimum  sets  corresponding  to  a  state  s.  Of 
course,  all  minimum  critical  sets  must  be  of  the  same  size. 

A  critical  set  of  an  attack  graph  G  =  (S,  E,  Sg,  Ss,  L)  is  defined  as  a  critical  set  corresponding  to  the 
initial  state  Sg.  Therefore,  the  Minimum  Critical  Set  of  Attacks  (MCSA)  problem  is  the  problem  of  finding  a 
minimum  critical  set  of  attacks  M(sg).  The  decision  version  of  the  problem  is  defined  as  follows:  given  an 
attack  graph  G  =  (S,  E,  Sg,  s^,  L)  and  a  positive  integer  K,  is  there  a  critical  set  of  attacks  A  C  A  such  that 
1^1  <K7 

Lemma  2  Assume  that  we  are  given  an  attack  graph  G  =  (S,  E,  Sg,  s^,  L)  and  an  integer  k.  The  MCSA 
problem  of  determining  whether  there  is  a  critical  set  C(sg)  such  that  |  C(sg)  \  <  k  is  AP-complete. 

Proof:  First,  we  prove  that  the  problem  is  in  NP.  Guess  aset  C  CA  with  size  <  k.  We  need  to  check  that  C 
is  a  critical  set  of  attacks.  This  can  be  accomplished  in  polynomial  time  using  the  reachability  algorithm 
described  before  (see  answer  to  question  2).  Therefore,  the  problem  is  in  NP. 

Next,  we  prove  that  the  problem  is  NW-hard.  The  reduction  is  from  the  minimum  hitting  set 
problem,  details  as  given  in  the  remainder  of  this  section. 

Assume  that  we  are  given  an  attack  graph  G  =  (S,  E,  Sg,  Ss,  L).  A  path  tt  is  a  sequence  of  states 
qi,...,  q„,  such  that  qi  €  S  and  (qg  di+i)  €  E.  A  complete  path  starts  from  the  initial  state  Sg  and  ends  in  the 
success  state  The  label  of  a  path  k  =  qi,  ...  qn  (abusing  notation,  we  will  denote  it  also  as  L(k  J)  is  a 
subset  of  a  set  of  attacks  A. 


Ui.,"-’L(qg  q.,g)}  \  {e}. 

L(k)  represents  the  set  of  atomic  attacks  used  on  the  path  ii.  A  set  of  attacks  A  CAis  called  realizable  in  the 
attack  graph  G  iff  there  exists  a  complete  path  ;r  in  G  such  that  L(k)  =  A.  In  other  words,  an  intruder  can  use 
the  set  of  attacks  A  to  start  from  the  initial  state  and  reach  the  success  state.  The  set  of  all  realizable  sets  in 
an  attack  graph  G  is  denoted  by  Rel(G).  The  following  lemma  is  easy  to  prove  and  follows  straight  from  the 
definitions. 

Lemma  3  Assume  that  we  are  give  an  attack  graph  G  =  (S,  E,  Sg,  Ss,  L).  A  set  of  attacks  A  is  critical  iff 

¥A’eRel(G).A  ’r\A  +  0. 

In  other  words,  all  realizable  sets  have  a  non-empty  intersection  with  a  critical  set  A. 

Question:  Is  there  a  subset  S  CS  with  (S)  <  K  such  that  S  contains  at  least  one  element  from  each  subset  in 
Cl 

Lemma  3  proves  that  the  problem  of  finding  whether  the  attack  graph  G  has  a  critical  set  of  size  <  K  is  the 
hitting  set  problem  with  C  =  Re  I  (G),  S  ^  A,  and  K. 

Next  suppose  we  have  an  instance  (C,  S,  K)  of  the  hitting  set  problem.  We  will  construct  an  attack 
graph  G  ^  (S’,  E’,  Sg-,  s^:  L’),  where  L’:E’^  S  U  {e},  i.e.,  the  set  of  attacks  used  in  the  attack  graph  G’  is  S. 
Moreover,  the  set  of  realizable  sets  Rel  (G’)  of  the  graph  G  ’  is  the  collection  C.  A  critical  set  of  size  <K  of 
the  attack  graph  G’  is  a  hitting  set  for  the  collection  C.  Next,  we  describe  the  construction  of  G’.  Let  C  = 
{Ci,...,Cm}  be  the  collection  of  sets  and  S  =  {si,...,s„}  be  the  set.  We  make  m  copies  S‘,...,  S”  of  the  set  S. 
The  set  of  elements  in  S'  will  be  denoted  by  {s‘,...,s‘  „}■  The  set  of  states  S’  in  the  attack  graph  G  ’  is 
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{So’  ,s/}US'U...US”. 


The  initial  state  is  So  ’  and  the  final  state  is  The  set  of  edges  E  ’  and  the  labeling  funetion  L  ’  are 
defined  as  follows: 

•  There  is  an  edge  from  ^ ’o  to  every  state  in  the  set  {s'j,  and  label  of  the  edge  ’o,  is 

Sj  if  S';  C  C„  otherwise  it  is  e. 

•  For  all  7  <  ;■  <  m  and  1  <j  <n-l,  there  is  an  edge  (s),  s‘j+i),  and  the  label  of  edge  (s),  s‘j+i)  is  Sj+i  if 
Sj+i  €  a,  otherwise  it  is  e. 

•  There  is  an  edge  from  every  state  in  the  set  {s^„,  s^n}  to  the  state  s  ’  and  labels  of  all  these 

edges  is  e. 

The  sizes  of  the  sets  S’  and  E’  in  the  attaek  graph  G’  are  mn  +  2  and  2m  +  mn  respeetively.  It  is 
easy  to  see  that  Rel(G’)  is  equal  to  C,  and  S’  C  5  is  a  eritieal  set  of  the  attaek  graph  G  ’  iff  5”  is  a  hitting  set 
for  the  eolleetion  C.  Sinee  the  size  of  G  ’  is  polynomial  in  the  size  of  the  instanee  of  the  hitting  set  problem 
and  the  hitting  set  problem  is  TVP-eomplete,  the  MCSA  problem  is  TVP-hard.  Lemma  2  proves  that  MCSA  is 
in  NP.  Therefore,  MCSA  is  AP-eomplete.  The  next  example  illustrates  our  eonstmetion. 

Note:  The  diseussion  above  also  proves  that  the  problem  of  finding  a  minimum  set  of  measures  whose 
adoption  will  make  the  network  safe  is  also  TVP-eomplete.  One  ean  simply  take  the  set  of  measures  M  to  be 
the  set  of  attaeks  A. 

Example  1  We  give  a  short  example  to  illustrate  the  reduetion.  Consider  a  set  5  =  {si,  $2,  S3}.  Suppose  that 
the  eolleetion  C  eonsists  of  the  following  subsets: 


C’l  = 

Oi  =  {-Sj.  S3} 

<■'3  =  {-Sj] 


The  attaek  graph  G’  eorresponding  to  this  problem  is  shown  in  Figure  6.  The  set  of  attaeks  is  (s/, 
S2,  S3}.  The  set  of  realizable  sets  Rel  (G’)  is  exaetly  the  eolleetion  C.  The  set  of  attaeks  {sj,  S2}  is  eritieal 
beeause  every  path  from  s'  ’0  to  the  sueeess  state  s  uses  at  least  one  edge  with  the  label  in  the  set  (sj,  S2}. 
Moreover,  (sj,  S2}  is  a  hitting  set  for  the  eolleetion  C  =  (Ci,  C2,  C3}. 

The  above  diseussion  proves  that  the  problem  of  finding  eritieal  sets  in  attaek  graph  is 
polynomially  equivalent  to  finding  hitting  sets  for  a  eolleetion,  with  one  eaveat-the  eolleetion  of  sets  C  is 
represented  as  an  attaek  graph.  An  attack  graph  can  be  an  exponentially  succinct  representation  of  a 
collection  of  sets.  Figure  7  shows  an  attaek  graph  of  linear  size  whose  set  of  realizable  sets  is  the  power  set 
of  {sj,...,Sn}.  Therefore,  the  minimum  eritieal  set  problem  is  polynomially  equivalent  to  the  hitting  set 
problem  where  the  eolleetion  of  sets  C  is  represented  as  a  labeled  direeted  graph. 
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Figure  6:  Attack  graph  corresponding  to  the  collection  C. 


«  « 

♦  « 


Figure  7:  Attack  graph  representing  an  exponential  number  of  realizahle  sets. 

Let  (C,  S,  K)  be  an  instanee  of  the  hitting  set  problem.  Let  S’  and  C’  be  initially  the  empty  set.  The  greedy 
algorithm  exeeutes  the  following  step  until  C  C. 

•  Piek  an  element  s'  out  of  the  set  S  \  S’  that  eovers  the  maximum  number  of  sets  in  the  eollection  C  1 
C’.  An  element  s  is  said  to  eover  a  set  Sj  C  5  iff  s  G  Sj. 

•  Let  s  be  the  element  pieked  in  the  previous  step  and  C(s)  be  the  eolleetion  of  sets  in  C  eovered  by 
s.  Update  S’  and  C’  as  follows: 

S’^S’Ufsj 
C’^C’  UC(s) 
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•  Let  Hd  be  the  d-\h  harmonic  number  Yf  i=i  1/i-  Let  C(s)  be  the  number  of  sets  in  the  collection  C 

that  are  covered  by  the  element  s. 

Lemma  4  GREEDY-HITTING-SET  is  a  polynomial-time  /jfw^-approximation  algorithm,  where  p(n)  = 
E[(maxscs{\  C(s)  \ }. 

The  proof  of  the  lemma  follows  from  the  equivalence  between  the  minimum  hitting  set  and  the 
minimum  cover  problem  [ADP80]  and  the  proof  of  the  approximation  factor  p(n)  for  the  greedy  algorithm 
for  the  minimum  cover  problem  [CLR85].  Using  the  equivalence  between  the  problems  of  finding  a 
minimum  critical  set  and  a  minimum  hitting  set,  we  can  construct  a  greedy  procedure  (called  GREEDY- 
CRITICAL-SET)  for  finding  a  critical  set  for  the  attack  graph.  Assume  that  we  are  given  an  attack  graph  G 
=  (S,  E,  So,  Ss,  L),  where  S  is  the  set  of  states,  is  C  5  v  5  is  the  set  of  edges,  G  5  is  the  initial  state,  s^C  S  is 
the  success  state  for  the  intruder,  and  L  :  E  ^  A  U  {e}  is  a  labeling  function.  Moreover,  assume  that  we  can 
compute  in  polynomial  time  the  function  Pq  :A  — >  N,  where  pc  (a)  is  the  number  of  realizable  sets  in  the 
attack  graph  G  that  contain  the  attack  a.  Formally,  Pq  (a)  is  equal  to 

I  {A’\a  C  A’  and  A’  C  Rel(G)}\ 

Initially,  let  A  ’  be  the  empty  set  and  G’=  G.  The  greedy  algorithm  GREEDY-CRITICAL-SET  executes  the 
following  step  until  G_  is  empty. 

•  Pick  and  element  a  from  the  set  A  14  ’  that  maximizes  pc’  (a). 

•  Let  a  be  the  element  picked  in  the  previous  step.  Update  A  ’  and  G  ’  as  follows: 

A’^A  Ufa} 

Remove  all  edges  labeled  with  a  from  G  ’ 

Lemma  5  GREEDY-CRITICAL-SET  is  a  polynomial-time  /?(«^-approximation  algorithm,  where  p(n)  = 
HfinaXaCAfRofa)})- 

Next,  we  explore  conditions  when  the  function  pc  can  be  computed  in  polynomial  time.  Assume 
that  the  attack  graph  G  is  a  DAG.  An  argument  for  this  was  given  in  Section  4.3.  Moreover,  assume  that 
each  atomic  attack  is  used  only  once  on  a  path  from  the  initial  state  So  to  the  success  state  5^.  This  is  not  an 
unreasonable  assumption  because  the  attack  graph  edges  are  labeled  with  instantiations  of  attack  templates 
shown  in  Section  4.3,  e.g.,  a  local-setuid-buffer-overflow  attacks  on  two  different  hosts  are  distinct  in  the 
attack  graph.  Such  attack  graphs  are  called  use-once  DAGs.  The  following  lemma  is  easy  to  prove. 

Lemma  6  For  an  attack  graph  that  is  a  use-once  DAG,  the  function  Pq  can  be  computed  in  time  that  is 
linear  in  size  of  the  attack  graph. 

Suppose  a  system  administrator  would  like  to  know  which  measures  would  increase  or  decrease 
the  likelihood  of  thwarting  an  attack?  If  we  have  probabilities  available  to  us,  we  can  annotate  attack 
graphs  to  help  system  administrators  answer  such  questions. 

In  our  work,  we  do  not  require  that  all  transitions  be  given  probabilities;  in  general,  our  annotated 
attack  graphs  can  have  a  mix  of  probabilistic  and  nondeterministic  state  transitions.  We  pursue  the 
implications  of  this  general  kind  of  attack  graph  in  this  section. 

In  general,  we  also  do  not  require  probabilities  to  be  numeric;  they  can  be  symbolic,  e.g.,  “high,” 
“medium,”  or  “low,”  and  even  partially  ordered.  In  an  earlier  paper  [JWOl],  we  discuss  an  analysis  that 
uses  symbolic  probabilities;  in  this  paper,  however,  we  restrict  ourselves  to  numeric  values. 
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6.1  Probabilistic  Attack  Graphs 

Suppose  that  the  graph  has  a  state  ^  with  only  two  outgoing  transitions.  In  a  regular  attaek  graph, 
the  ehoiee  of  whieh  transition  to  take  when  the  system  is  in  state  s'  is  nondeterministie.  However,  we  may 
have  some  empirieal  data  that  enables  us  to  estimate  that  whenever  the  system  is  in  state  s,  on  average  it 
will  take  one  of  the  transitions  four  times  out  of  ten  and  the  other  transition  six  remaining  times.  We  ean 
plaee  probabilities  0.4  and  0.6  on  the  eorresponding  edges  in  the  attaek  graph.  Intuitively,  the 
probability  of  the  transition  s  — »■  s  ‘  represents  the  likelihood  that  the  atomie  attaek  eorresponding  to  the 
transition  will  sueeeed.  We  eall  a  state  with  known  probabilities  for  outgoing  transitions  probabilistic. 
When  we  have  assigned  all  known  probabilities  in  this  way,  we  are  left  with  an  attaek  graph  that  has  some 
probabilistie  and  some  nondeterministie  states  in  it.  We  eall  sueh  mixed  attaek  graphs  probabilistic  attack 
graphs.  We  use  probabilistie  attaek  graphs  to  evaluate  the  reliability  of  a  network.  Note  that  probabilities  of 
all  the  transitions  might  not  be  available  beeause  of  laek  of  data,  e.g.,  a  new  type  of  atomie  attaek. 

Sinee  the  attaek  graph  ineludes  only  those  states  and  transitions  that  ean  lead  to  sueeess  states,  it 
exeludes  some  transitions  that  exist  in  the  eomplete  model  M.  These  exeluded  transitions  ean  have  non-zero 
probability,  so  that  the  sum  of  probabilities  of  transitions  from  a  probabilistie  state  will  be  less  than  1.  To 
address  this  problem,  we  must  model  the  rest  of  M  in  some  way.  We  add  a  “eateh-all”  escape  state  Se  to  the 
attaek  graph.  A  probabilistie  state  ^  in  the  attaek  graph  will  have  a  transition  to  Sg  if  and  only  if  in  M  there  is 
a  transition  from  ^  to  some  state  not  in  the  attaek  graph.  The  probability  of  going  from  ^  to  Se  will  be  1 
minus  the  sum  of  the  probabilities  of  going  to  other  states.  There  are  no  transitions  out  of  Sg  exeept  a  self¬ 
loop  (whieh  preserves  the  totality  of  the  transition  relation  x). 

In  an  attaek  graph  eontaining  the  eseape  state  Sg  attaeks  are  allowed  to  terminate  in  Sg.  We  will  eall 
them  escape  attacks,  or  attaeks  that  were  pre-empted  by  the  intruder  before  he  reaehed  his  goal. 

6.1.1  Definition  of  PAGs 

Definition  3  A  probabilistic  attack  graph  or  PAG  is  a  tuple  G  =  (S„,  Sq,  Sg,  S,  z,  k.  So,  Sg,  L),  where  S„  is  a 
set  of  nondeterministie  states,  Sq  is  a  set  of  probabilistie  states,  Sg  G  is  a  nondeterministie  eseape  state  (Sg 
€  Sg),  S  ^  S„  U  Sq  is  the  set  of  all  states,  z  C  S  x  S  is  a  transition  relation,  tt  — >  — >  5  — >  7?  are  transition 

probabilities.  So  CS  is  a  set  of  initial  states,  5s  C  5  is  a  set  of  sueeess  states,  and  L  :  S  ^  2^^  is  a  labeling  of 
states  with  a  set  of  propositions  true  in  that  state. 

A  probabilistie  attaek  graph  distinguishes  between  nondeterministie  states  (set  S„)  and 
probabilistie  states  (set  Sq).  Moreover,  the  sets  of  nondeterministie  and  probabilistie  states  are  disjoint  (S  „ 
n  Sq  ^  0).  The  funetion  n  speeifies  probabilities  of  transitions  from  probabilistie  states,  so  that  for  all 
transitions  Si^  S2C  z  sueh  that  Si  C  Sq,  we  have  P(si  — »■  S2)  =  k(s1)(s2)  >  0.  Thus,  k(s)  ean  be  viewed  as  a 
probability  distribution  on  next  states.  Intuitively,  when  the  system  is  in  a  nondeterministie  state  s„,  we 
have  no  information  about  the  relative  probabilities  of  the  possible  next  transitions.  When  the  system  is  in  a 
probabilistie  state  Sq,  it  will  ehoose  the  next  state  aeeording  to  probability  distribution  K(Sq). 

Let  G  =  (S,  z.  So,  Sg,  L)  be  the  attaek  graph  and  P  a  funetion  that  assigns  probabilities  to 
transitions.  The  probabilities  ean  be  loosely  interpreted  as  the  probability  of  the  atomie  attaek 
eorresponding  to  the  transition  sueeeeding.  We  are  interested  in  finding  the  reliability  of  the  attaek  graph, 
i.e.,  the  probability  that  the  intruder  will  not  sueeeed.  We  ean  view  G  as  a  Markov  ehain  with  S  as  its  state 
spaee  and  P(si  — »■  S2)  as  its  transition  probability.  Let  C/  .'  5  — >  7?^  be  the  steady  state  probability  of  the 
Markov  ehain  (see  [Dur95]  for  definitions  and  teehnieal  eonditions).  In  this  ease,  the  reliability  of  the 
attaek  graph  G  is  given  by  the  following  expression: 

1  -  ZsessU(s) 


In  other  words,  the  reliability  is  the  probability  that  in  the  “long  run”  the  Markov  ehain  will  not  be  in  a  state 
in  the  set  Sg. 

In  general,  however,  we  do  not  have  probabilities  assigned  to  all  transitions;  thus  in  Seetion  6.2  we 
show  how  to  perform  similar  reliability  analysis  on  probabilistie  attaek  graphs  in  the  presenee  of 
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nondeterministic  states.  The  justification  of  our  approach  relies  on  converting  a  probabilistic  attack  graph 
(PAG)  into  an  alternating  probabilistic  attack  graph  (APAG)  and  then  interpreting  the  result  as  a  Markov 
Decision  Process;  we  give  this  construction  and  interpretation  in  Section  6.3;  we  give  the  proof  of 
correctness  of  the  MDP  value  iteration  algorithm  applied  to  PAGs  in  Section  6.4.  Sections  6.3  and  6.4  can 
be  skipped  upon  a  first  reading. 

6.2  Reliability  Analysis  of  PAGs 

Assume  that  we  are  given  a  PAG  tuple  G  =  (S„,  Sg,  Se,  S,  z,  k.  So,  Ss,  L).  Intuitively,  we  are 
interested  in  finding  out  the  probability  that  the  intruder  will  reach  a  success  state  starting  from  one  of  the 
initial  states.  As  shown  above,  in  the  absence  of  nondeterministic  states  we  can  compute  this  metric  by 
using  the  steady  state  probabilities  of  the  Markov  chain.  In  the  presence  of  nondeterministic  states  the 
intmder  will  choose  transitions  in  order  to  maximize  his  probability  of  succeeding.  For  example,  if  an 
intruder  reaches  a  nondeterministic  state  ^  with  transitions  to  Sj,...,Sk,  he  will  choose  to  transition  to  state 
Si(l  <i  <n)  which  will  maximize  his  probability  of  reaching  a  success  state.  This  idea  can  be  “formalized” 
using  concepts  from  the  theory  of  Markov  Decision  Processes  [Alt99,  Put94]. 

6.2.1  Value  Iteration  for  PAGs 

Given  a  state  the  set  of  successors  of  ^  is  denoted  by  succ(s).  Formally,  succ(s)  is  equal  to  | 
(s,  s’)  €  z}.  First,  we  define  a  value  function  V :  S  — >7?^.  For  all  ^  €  S^,  V(s)^  1.0.  For  all  states  s  €  S  \Ss  the 
value  function  is  iterated  according  to  the  following  equations  until  convergence. 

F (s)  maXs  ’Csucc(s)  }  if  s  G  S,i  \  Ss 

succ(s)l^(^  )^(^  )  if  ^  ^  ^ 

Let  V*  be  the  value  function  after  convergence.  Intuitively,  Y,sCsol^*(s)  is  the  probability  for  the 
intmder  to  reach  a  success  state  if  he  “breaks”  the  nondeterminism  to  maximize  the  probability  of 
succeeding.  Therefore,  the  worst  case  reliability  of  the  network  is  1  -  Y,scsoV*(s).  This  algorithm  is  known 
as  value  iteration.  The  justification  of  the  value  iteration  algorithm  as  applied  to  PAGs  is  presented  in 
Section  6.4. 

6.2.2  Example  Revisited 

We  implemented  the  value  iteration  algorithm  in  our  attack  graph  post-processor  and  ran  it  on  a 
slightly  modified  version  of  the  inttnsion  detection  example  from  Section  4.  In  the  modified  example,  each 
attack  has  both  detectable  and  stealthy  variants.  The  intmder  chooses  which  atomic  attack  to  try  next,  and 
he  has  a  certain  probability  of  picking  a  stealthy  or  a  detectable  variant.  We  assigned  imaginary 
probabilities  of  picking  a  stealthy  attack  variant  as  follows:  0.2  for  sshd  buffer  overflow,  0.5  for  ftp  .rhosts, 
0.05  for  the  other. 

In  this  setup,  the  computed  probability  of  intmder  success  is  0.2,  and  his  best  strategy  is  to  attempt 
sshd  buffer  overflow  on  host  ipi,  and  then  conduct  the  rest  of  the  attack  from  that  host.  The  only  possibility 
of  detection  is  the  sshd  buffer  overflow  attack  itself,  since  the  IDS  does  not  see  the  activity  between  hosts 
ipi  and  ip 2.  Given  this  context,  a  system  administrator  can  answer  the  following  question: 

Question  4:  The  deployment  of  which  security  measure(s)  will  increase  the  likelihood  of  thwarting  an 
attacker? 

Answer:  Installing  an  additional  IDS  component  to  monitor  the  network  traffic  between  hosts  ipi  and  ip 2 
reduces  the  probability  of  the  intmder  remaining  undetected  to  0.025;  installing  a  host-based  IDS  on  host 
ip2  reduces  the  probability  to  0.16.  Other  things  being  equal,  this  is  an  indication  that  the  former  remedy  is 
more  effective. 
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6.3  Alternating  Probabilistic  Attack  Graphs  and  Markov  Decision  Processes 

In  this  section  we  show  that  probabilistic  attack  graphs  can  be  reduced  to  Markov  Decision 
Processes  (without  the  reward  function).  We  then  demonstrate  how  we  can  assign  a  reward  function  to 
attack  graphs  such  that  standard  MDP  algorithms  can  be  used  to  compute  reliability  metric  of  the  network 
being  modeled. 


Definition  4  [Alt99,  Put94]  A  Markov  Decision  Process  is  a  tuple  (X,  A,  P,  c)  where 

•  X  is  a  finite  state  space.  Generic  notation  for  MDP  states  will  be  x,  y,  z. 

•  ^  is  a  finite  set  of  actions.  A(x)  C  A  denotes  the  actions  that  are  available  at  state  x.  Set  K  =  (x,  a) 

:  X  € X,  a  €  A(x)  is  the  set  of  state-action  pairs.  A  generic  notation  for  an  action  will  be  a. 

•  P  :  X X  A  X  X  are  the  transition  probabilities;  thus,  P(xay)  (also  written  as  Pxay)  is  the  probability 
of  moving  from  state  x  to  y  if  action  a  is  chosen. 

•  r  .•  X  — >  7?  is  an  immediate  reward.  Cost  may  be  equivalently  viewed  as  a  negative  reward.  We  will 
freely  use  the  term  cost  to  mean  negative  reward,  and  vice  versa. 

An  execution  fragment  (also  known  as  history  in  the  traditional  MDP  literature)  of  an  MDP  is  a  sequence 
Xoa]X]...a„x„  of  alternating  states  and  actions  such  that  the  sequence  begins  and  ends  with  a  state,  and  for  all 
0  <  k<n,  QkC A(xk.i)  and  0  < 

P(Xk-i,  ak,  Xk)  <1.  Given  an  execution  fragment  e  =  XoajXj...a„x„  ,  the  probability  of  the  execution  fragment 
(denoted  by  P(e))  is  given  by  the  following  expression: 

n 

n  P(Xk.h  Uh  Xk) 

K=1 


It  is  possible  to  convert  a  probabilistic  attack  graph  into  an  MDP  such  that  the  behaviors  of  the 
PAG  and  the  MDP  are  identical.  To  explain  the  conversion  procedure,  we  define  a  restricted  kind  of 
probabilistic  attack  graph. 

Definition  5  An  alternating  probabilistic  attack  graph  or  APAG  is  a  tuple  G  =  fS„,  Sg,  Se,  S,  z,  k.  So,  S^,  L), 
where  S„  is  a  set  of  nondeterministic  states,  Sg  is  a  set  of  probabilistic  states,  Sg  G  is  a  nondeterministic 
escape  state,  S  =  S„  U  Sg  is  the  set  of  all  states,  z„  C  S  „x  Sgis  a  set  of  nondeterministic  transitions,  Zg  C  Sg 
— >  is  a  set  of  probabilistic  transitions,  k  :  Sg^  S„  — are  transition  probabilities,  SoQSis  a  set  of  initial 
states,  SgCS  is  a  set  of  success  states,  and  L  :  S  ^  2“^^  is  a  labeling  of  states  with  a  set  of  propositions  true 
in  that  state. 
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Figure  8:  Converting  PAG  to  APAG 

An  alternating  probabilistic  attack  graph  (APAG)  does  not  have  any  transitions  between  two 
nondeterministic  or  between  two  probabilistic  states.  In  other  words,  a  nondeterministic  state  has 
transitions  to  probabilistic  states  only,  and  vice  versa.  An  execution  of  an  APAG  will  always  have  strictly 
alternating  nondeterministic  and  probabilistic  states. 

Next  we  describe  an  algorithm  that  converts  a  PAG  Gp  =  (S„,  Sg,  Se,  S,  %,  n,  So,  Ss,  L)  into  an  APAG 
Gp'^  =  (S,f,  Sg*,  Se,  S,  t/,  t/,  So,  S^,  L'^)  that  has  equivalent  behaviors.  The  algorithm  works  by  adding 
hidden  states  and  transitions  to  the  graph  such  that  every  execution  becomes  strictly  alternating,  yet  does 
not  change  its  observable  (non-hidden)  components. 

We  start  with  5/  =  S„,  5/  =  Sg ,  t/  0„  ,  t/  0,  -  0,  and  L'*  =  L.  Next, 

1 .  Whenever  t  has  a  transition  from  probabilistic  state  Sj  to  nondeterministic  state  S2,  we  add  the  transition 
to  Tg^  and  its  probability  to 

2.  Whenever  t  has  a  transition  from  nondeterministic  state  Sj  to  probabilistic  state  S2,  we  add  the  transition 
to 

3.  Whenever  t  has  a  transition  between  two  nondeterministic  states  Sj  and  S2,  we  add  a  hidden  probabilistic 
state  Si,  to  5/,  an  observable  transition  Si^  Sh  to  t/,  and  a  hidden  transition  Si,  — »■  S2  to  Zp'^,  assigning  the 
latter  probability  1.0  in  tz"*  (FigureSa).  We  also  set  L"*^Sh)  =  L{s2). 

4.  Whenever  z  has  a  transition  between  two  probabilistic  states  S]  and  S2,  we  add  a  hidden  nondeterministic 
state  Si,  to  S/,  a  hidden  transition  Si,  — »■  S2  to  t/,  and  an  observable  transition  Sj^  Si,  to  Zp^,  assigning  the 
latter  the  original  probability  p  of  going  from  to  ^2  (Figure  8(b)).  We  also  set  L^^St)  =  L{si). 

Let  Gp  be  a  PAG  and  Gp^  be  the  corresponding  APAG.  An  execution  fragment  e  =  SoSi...s„  in  Gp'^  is  called 
proper  if  the  start  and  end  states  (i'o  and  s,^  are  observable  states.  Let  e  be  a  proper  execution  fragment  of 
Gp'^.  We  define  by  removing  hidden  states  and  hidden  transitions  from  e,  i.e.,  restricting  the  execution 
to  observable  states  and  transition.  Consider  an  execution  fragment  e  =  SoSi...s„.  Let  Sp  (e)  be  the  set  of 
probabilistic  states  in  the  set  (sg,...,  s„.i).  Define  the  probability  of  an  execution  fragment  e  (denoted  by 
P(e))  as: 


n  P(Si  Si+j). 
Si€Sp(e) 
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In  other  words,  the  probability  of  an  exeeution  fragment  is  the  produet  of  the  probabilities  of  the 
probabilistie  transitions  in  it.  The  following  lemma  follows  straight  from  the  eonstruetion. 


Lemma  7  Let  Gp  be  a  PAG  and  be  the  eorresponding  APAG.  Let  e  be  a  proper  exeeution  fragment  of 
Gp*.  The  following  three  statements  are  true: 

1.  is  an  exeeution  fragment  of  Gp. 

2.  P(e)  =  where  the  first  probability  is  interpreted  in  Gp'*  and  the  seeond  probability  is  interpreted 

in  Gp. 

3.  For  all  exeeution  fragments  e i  of  Gp  there  exists  proper  exeeution  fragment  e  in  Gp'*  sueh  that  e  = 


Figure  9:  Converting  an  APAG  to  a  MDP 

Lemma  7  elearly  shows  that  there  is  a  one-to-one  eorrespondenee  (given  by  obs)  between  proper 
exeeution  fragments  of  a  APAG  and  eorresponding  exeeution  fragments  of  a  PAG.  Moreover,  this 
eorrespondenee  preserves  probabilities.  We  have  shown  that  APAGs  have  the  same  expressive  power  as 
PAGs,  so  hereafter  we  eonsider  them  interehangeable. 

An  APAG  (S„,  Sq,  Se,  S,  t„,  Tq,  k.  So,  S^,  L),  has  a  direet  interpretation  as  an  MDP  Mq  =  (X,  A,  P, 
c),  where  X  ^  S„,  A  =  t„.  That  is,  eaeh  aetion  in  the  MDP  represents  a  transition  from  a  nondeterministie  to 
a  probabilistie  state.  Further,  let  x,  y  €  X  and  a  €  A(x),  so  that  a  represents  a  transition  from  x  to  some 
probabilistie  state  Sq  in  the  APAG.  Then  we  have  P(x,  a,y)^K  (Sq)(y). 

It  is  preferable  to  have  all  APAG  sueeess  states  represented  explicitly  as  MDP  states,  so  that  we 
can  reason  about  attacks  in  the  MDP  context.  For  this  reason,  we  add  a  hidden  nondeterministie  state  (and  a 
transition  thereto)  to  every  probabilistic  success  state  in  the  APAG.  We  omit  proofs  of  equivalence  of  an 
APAG  before  and  after  this  modification. 

Figure  9(a)  shows  an  example  APAG,  with  the  corresponding  MDP  shown  in  Figure  9(b).  The 
nondeterministie  transitions  from  the  root  node  in  the  APAG  are  represented  by  the  MDP  actions  a,  b,  and 
c.  The  leftmost  leaf  in  the  APAG  is  a  probabilistic  success  state;  in  the  MDP  it  is  represented  by  the 
appended  hidden  nondeterministie  state. 

This  however,  plays  a  role  in  our  interpretation  of  results  obtained  through  MDP  algorithms. 
Finally,  we  can  choose  the  reward  function  r  depending  on  the  questions  we  are  trying  to  answer. 
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Let  e  =  So‘sfs"...Si"''s,l’s„"  be  an  execution  fragment  of  the  APAG  G,  where  St  and  represent 
nondeterministic  and  probabilistic  states  respectively.  Let  mdp(e)^  =  So"ti"s"...s„",  where  t"  is  the 
action  that  corresponds  to  the  transition  — »■  s[  .  Notice  that  in  mdp(e)  probabilistic  states  do  not  occur. 

The  proof  of  the  following  lemma  follows  straight  from  the  construction. 

Lemma  8  Let  G  be  a  APAG  and  Mq  be  the  corresponding  MDP.  Let  e  be  an  execution  fragment  of  G  and 
mdp(e)  be  the  corresponding  execution  fragment  in  the  MDP  Mq.  The  following  statements  are  true. 

1.  mdp(e)  is  an  execution  fragment  of  the  MDP  Mq. 

2.  P(e)  =  P(indp(e)),  where  P(e)  and  P(mdp(e))  are  interpreted  in  G  and  Mq  respectively. 

3.  For  all  execution  fragments  in  the  MDP  Mq,  there  exists  an  execution  fragment  e  in  G  such  that 
mdp(e)  =  e^. 

6.4  Correctness  of  the  Value  Iteration  Algorithm  for  Attack  Graphs 

Let  G  =  (S„,  Sg,  Se,  S,  T,  TT,  So,  S„  L)  be  a  PAG,  and  G"*  =  (S„^,  S/,  s„  S^,  t/,  t/,  Sg,  S„  L^)  be 
the  corresponding  APAG.  Recall  that  the  APAG  G^  is  obtained  from  the  PAG  G  by  adding  hidden  states 
whenever  there  is  a  transition  between  two  nondeterministic  or  probabilistic  states  (see  Section  6.3).  An 
APAG  G  =  (S„,  Sg,  Se,  S,  T„,  Tp,  K,  So,  Ss,  L)  has  a  direct  interpretation  as  an  MDP  Mq  ^  (X,  A,  P,  r),  where  X 
=  Sn,  A  =  T„.  That  is,  each  action  in  the  MDP  represents  a  transition  from  a  nondeterministic  to  a 
probabilistic  state.  Further,  let  x,  y  €  X  and  a  €  A(x),  so  that  a  represents  a  transition  from  x  to  some 
probabilistic  state  Sg  in  the  APAG.  Then  we  have  P(x,  a,  y)  =  k  (Sg)(y).  We  first  demonstrate  that  the  value 
iteration  algorithm  (or  VI  for  short)  on  the  APAG  G"^  is  simply  a  transformed  version  of  the  value  iteration 
algorithm  on  the  corresponding  MDP  Mq  with  an  appropriate  reward  function  r.  After  that,  we  prove  that 
the  value  iteration  algorithm  on  the  PAG  and  the  corresponding  APAG  converge  to  the  same  value.  The 
advantage  of  this  approach  is  that  all  the  technical  results  in  the  context  of  value  iteration  in  MDPs  can  be 
directly  applied  to  value  iteration  in  PAGs  [Put94,  Chapter  9]. 

6.4.1  Correspondence  Between  Value  Iteration  in  MDPs  and  APAGs 

Consider  a  MDP  M  =  (X,  A,  P,  r).  A  value  function  is  positive  real  valued  function  F  .•  A  — >  . 

The  value  iteration  algorithm  uses  the  following  equation  to  update  the  function  V\ 

V  (x)  =  max[r(x,  a)  +  J^P(x,a,y)V(y)] 
aeA(x)  yCX 

Technical  conditions  that  guarantee  the  convergence  of  the  value  iteration  algorithm  can  be  found  in 
[Put94,  Chapter  9]. 

Let  G"^  be  an  APAG  and  Mq  be  the  corresponding  MDP.  Recall  that  we  assumed  that  all  success 
states  in  G"^  are  nondeterministic  states  so  that  they  are  explicitly  represented  in  the  MDP  Mq.  Before  we 
proceed,  we  need  to  slightly  modify  the  MDP  Mq.  We  add  a  new  state  s„e„  and  action  anew  to  the  MDP  Mq. 
The  only  action  allowed  from  s„ew  is  a„ew  (A(s„ew)  ^  fa„ewj  and  P(s„ew,  anew,  s„ew)  =  1.0  (so  by  definition  P( 
Snew  ,  cinew  ,  s)  =  0.0  if  ^  7^  Snew)-  Morcovcr,  wc  add  the  action  anew  to  the  action  set  corresponding  to  the 
success  states  5/^  and  for  all  ^  C  5/ we  have  P(s,  a„ew ,  Snew)  =  LO  (so  by  definition  P(s,  anew,  s’)  ^  0.0  if  s' 
Snew),  We  have  the  following  reward  function  r: 

R(s,  a)  =  1.0  if  s  €  Ss  and  a  =  anew 
0.0  otherwise 


We  have  a  value  function  that  assigns  1.0  to  every  state.  It  is  easy  to  see  that  the  value  function  assigns  1.0 
to  the  newly  added  state  Snew  and  1.0  to  a  state  in  the  set  Sf.  For  states  that  are  not  in  the  set  {Smw}  U  Ss  the 
value  function  V  changes  according  to  the  following  equation: 
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V  (x)  ^  max  I^P(x,a,y)V(y) 
a€A(x)  yCX 

=  max  Y^PiSq-^yMy) 

SgSuccfx)  y€X 

The  second  equation  follows  from  the  construction  of  the  MDP  Mq  from  the  APAG  G^.  Recall 
that  actions  in  the  MDP  correspond  to  the  transitions  from  nondeterministic  to  probabilistic  states.  Next  we 
extend  the  value  function  V to  probabilistic  states  5.jby  defining  V(s)  (for  all  ^  C  Sq)  as: 

yCX 

Notice  that  in  an  APAG  only  successors  of  a  probabilistic  state  ^  are  nondeterministic  state,  so 
V(yJ  is  well  defined.  Using  this  definition  the  value  iteration  algorithm  can  be  re-written  as: 

V(s)  =  maxs’ e,ucc(,)Ffs  ■)  if  s  €  S„\S, 

')  y(s  ’)  if  S  €  Sq  \  Ss 


The  value  iteration  (VI)  equation  given  above  was  obtained  by  transforming  the  VI  equation  for 
the  corresponding  MDP.  Moreover,  the  equation  we  obtain  is  exactly  the  VI  equation  for  an  APAG  that 
was  provided  earlier  (see  Section  6.2). 

6.4.2  Correspondence  Between  Valne  Iteration  in  MDPs  and  PAGs 

Let  (S„,  Sq,  Se,  S,  T,  TV,  Sg,  S„  L)  be  a  PAG,  and  =  fV/,  5/,  S^,  r/,  r/,  Sg,  S„  L^)  be 

the  corresponding  APAG.  Recall  that  G^  is  obtained  from  G  by  adding  hidden  states  whenever  there  is  a 
transition  between  two  nondeterministic  or  probabilistic  states  (see  Figure  8).  Suppose  there  is  a  transition 
between  two  nondeterministic  states  S]  and  ^2  in  G.  In  Ga,  we  add  a  new  probabilistic  state  si,  and  add 
transitions  Sh  and  Sh  — *■  S2,  where  the  probability  of  the  transition  Sh  — »■  S2  is  1.0.  Consider  the  ;-th 
iteration  of  the  VI  algorithm  in  G.  In  this  case,  the  value  V(s2)  in  the  (i  -  7)-the  iteration  is  used  to  update 
the  value  of  the  state  ^ y  Now  consider  the  value  iteration  algorithm  in  G^.  The  value  V(s^  of  the  hidden 
state  Sh  in  the  (i  -7)-th  iteration  is  used  to  update  the  value  of  V  (s/)  in  the  i-th  iteration.  It  is  easy  to  see  that 
V(Sh)  in  the  (i  -7)-th  iteration  is  V  ($2)  in  the  (i  -2)-th  iteration.  Therefore,  hidden  states  add  a  delay  of  I  in 
the  value  iteration  algorithm.  The  case  for  transition  between  two  probabilistic  states  is  analogous. 

Consider  a  PAG  (Sn,  Sq,  Se,  S,  z,  k,  Sg,  S^,  L).  The  equation  for  the  value  iteration  algorithm 
without  delay  is: 

V(s)=  1.0  ifsGSs 

maxye.,ucc(.gP'(s  )  if  S'  G  1  Ss 

X.5'e.mccfj/’7s— >  S  ’)  P'(s  ’)  if  S  €  Sq  \  Ss 

We  have  added  the  iteration  index  i  to  the  VI  algorithm  so  that  we  can  refer  to  it  in  the  proof  The 
value  iteration  algorithm  with  the  delay  is: 

Vi‘(s)=  1.0  ifsGSs 

Max{maxye.,ucc(yr:s„Pfs’),  Max{maxyesuccg>ns,P‘(s’)  if  s  C Sn\Ss 
'€succ(s)  r\Sq  P  (S  ^  S  J  fS  J  X  S'CsuccfsJ  nSn  P(s^  S  ')  rfs  )  if  SC  Sq\Ss 

Initially,  both  sequences  start  with  the  value  functions  V*  and  Fy^that  assign  1.0  to  states  in  Sf  and 
0.0  to  all  other  states.  Notice  that  in  the  value  iteration  algorithm  for  V/  there  is  delay  of  1  added  (the  (i  -2)- 
th  value  for  all  s  G 5 ,  V (s’)  € 
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V (s)  and  V'(s  )  €  V‘  (s).  For  i  >  2i,  the  following  inequality  also  holds  for  all  ^  C  5  and  i  >  2\ 
V(s)>Vi‘(s)>V-^  (s) 

The  equation  given  above  direetly  follows  from  the  monotonieity  property  and  the  equations  that 
define  value  iteration. 

Suppose  V  converges  to  V*  pointwise,  i.e.,  for  all  s  C  S,  V(s)^  V*(s).  Next  we  prove  that  for  all  ^ 
C  S,  if  Vi  ‘(s)  — »■  Vt(s),  then  V  /(s)  — >•  V»(s).  This  proves  that  Fyalso  converges  to  V*.  By  definition  of 
convergence,  for  all  e  >  0,  there  exists  a  positive  integer  N(e)  such  that  for  all  i  >  N(e)  we  have: 

\v,(s)-vrs)\  <  e 

Assume  that  we  are  given  a  >  d.  It  is  easy  to  see  that  the  limit  Vt(s)  >  V (s)  for  all  i  (this  follows 
from  the  fact  that  V(s)  is  a  monotonic  sequence).  Therefore,  we  have  the  following  inequality: 

\V4s)-Vr(s)\  <  \V.(s)-V‘-^(s)\ 


The  equation  given  above  follows  from  the  inequality  V/(s)  >  V'^(sJ  for  all  s.  Since  V‘(s)  — »■ 
Vt{s),  there  exists  an  N(B)  such  that  if  i  >  N(B),  then: 

\v,(s)-vrs)\  <  B 

By  the  argument  given  above  |  Vt(s)  -  Vj  ‘(s)  |  <  for  ;  >  N(B)  +  2.  This  proves  that  Vi  ‘(s)  — > 
V*(s).  Conversely  assume  that  Vj  converges  to  F  • .  Using  the  inequality  given  below  it  is  easy  to  prove  that 
V‘(s)  ^  VV(s). 


\VV(s)-V‘(s)\  <  \VV(s)-V/(s)\ 

Therefore,  we  prove  that  the  value  iteration  algorithm  with  and  without  delay  converge  to  the 
same  value.  The  VI  algorithm  with  delay  is  essentially  the  VI  algorithm  on  the  APAG  which  was 
derived  from  the  VI  algorithm  on  the  corresponding  MDP.  Therefore,  the  correctness  of  the  VI  algorithm 
on  the  PAG  G  follows. 

7  Summary  of  Contributions  and  Future  Work 

Our  foremost  contribution  is  the  automatic  generation  of  attack  graphs.  Our  key  insight  is  that  an 
attack  is  equivalent  to  a  counterexample  produced  by  off-the-shelf  model  checkers;  the 
attack/counterexample  is  a  witness  to  a  violation  of  a  safety  property.  By  a  small,  but  critical  enhancement 
to  an  existing  model  checker,  i.e.,  NuSMV,  we  can  easily  produce  attack  graphs  automatically;  moreover, 
these  graphs  are  succinct  and  exhaustive.  A  by-product  of  this  part  of  our  work  is  showing,  by  example, 
what  level  of  abstraction  is  appropriate  for  modeling  attacks.  We  use  simple  state  machine  specifications  to 
model  not  just  intruder  behavior  (by  a  set  of  atomic  attacks),  but  also  normal  system  behavior,  system 
administrator  recovery  actions,  and  connectivity  (communication)  between  subsystems. 

Our  second  most  important  contribution  is  support  for  a  range  of  formal  analyses  of  attack  graphs. 
Security  analysts  use  attack  graphs  informally  for  attack  detection,  defense,  and  forensics.  In  this  paper,  we 
explain  how  they  can  now  use  our  minimization  analysis  technique  on  attack  graphs  to  more  precisely 
answer  questions  like  “Which  security  measure  should  I  deploy  in  order  to  thwart  this  set  of  attacks?”  and 
“Which  set  of  security  measures  should  I  deploy  to  guarantee  the  safety  of  my  system?”  To  do  reliability 
analysis,  we  annotate  attack  graphs  with  probabilities  and  then  interpret  them  as  Markov  Decision 
Processes  (MDP).  Then,  by  using  MDP  algorithms  such  as  value  iteration,  security  analysts  can  more 
precisely  answer  questions  like  “Will  deploying  this  intmsion  detection  system  increase  or  decrease  the 
likelihood  of  thwarting  this  type  of  attack?” 
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On  the  theoretical  front,  we  have  so  far  restricted  our  work  to  only  safety  (invariant)  properties.  To 
exploit  the  full  power  of  model  checking,  we  need  a  method  of  generating  attack  graphs  for  more  general 
classes. 


AG(server.user.request  — >  AF(server.user.acesss)) 

This  property  would  not  be  true  if  the  server  can  be  disabled  using  a  denial-of-service  attack. 
Another  such  liveness  property  is  that  a  legitimate  user’s  transaction  will  finish  despite  intruder 
interference.  We  plan  to  explore  generation  of  attack  graphs  for  universally  quantified  fragments  of 
Computational  Tree  Logic  and  Linear  Temporal  Logic.  On  the  practical  front,  we  plan  to  conduct  larger 
case  studies  to  illustrate  the  usefulness  of  automatically  generating  attack  graphs.  To  make  our  tool  suite 
more  usable  by  security  experts  and  system  administrators,  we  see  the  value  of  building  a  library  of 
specifications  of  atomic  attacks.  Our  hope  is  that  increasing  this  arsenal  of  specifications  outpaces  the 
growth  in  the  arsenal  of  known  attacks;  we  can  potentially  discover  new,  unexpected  attacks,  and  hence 
identify  new  network  vulnerabilities.  Finally,  we  also  intend  to  build  a  tool  that  merges  our  work  on  attack 
graphs  with  existing  intrusion  detection  technologies.  The  tool  is  intended  help  security  analysts  evaluate 
and  enhance  the  security  of  a  network. 
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Abstract 


Survivability  is  the  ability  of  a  system  to  continue  operating  despite  the  presence  of  abnormal  events  such 
as  failures  and  intrusions.  Ensuring  system  survivability  has  increased  in  importance  as  critical 
infrastructures  have  become  heavily  dependent  on  computers.  In  this  paper  we  present  a  systematic 
method  for  performing  survivability  analysis  of  networked  systems.  An  architect  injects  failure  and 
intrusion  events  into  a  system  model  and  then  visualizes  the  effects  of  the  injected  events  in  the  form  of 
scenario  graphs.  Our  method  enables  further  global  analyses,  such  as  reliability,  latency,  and  cost-benefit 
analyses,  where  mathematical  techniques  used  in  different  domains  are  combined  in  a  systematic  manner. 
We  illustrate  our  ideas  on  an  abstract  model  of  the  United  States  Payment  System. 


Keywords:  survivability,  model  checking,  reliability  analysis,  cost  analysis,  Markov  Decision  Processes, 
fault-tolerance,  security 
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1  Introduction  and  Motivation 

Increasingly  our  critical  infrastructures  are  becoming  heavily  dependent  on  computers.  We  see 
examples  of  such  infrastructures  in  all  domains,  including  medical,  power,  telecommunications,  and 
finance.  Whereas  automation  provides  society  with  the  advantages  of  efficient  communication  and 
information  sharing,  the  pervasive,  continuous  use  of  computers  exposes  our  critical  infrastructures  to  a 
wider  variety  and  higher  likelihood  of  accidental  failures  and  malicious  attacks.  Disruption  of  services 
caused  by  such  undesired  events  can  have  catastrophic  effects,  including  loss  of  human  life. 

Survivability  is  the  ability  of  a  system  to  continue  operating  in  the  presence  of  accidental 
failures  or  malicious  attacks  [7].  We  use  the  term  fault  for  both  accidental  failures  (e.g.,  a  disk  crash)  and 
malicious  attacks  (e.g.,  a  denial-ofservice  attack).  The  precise  semantics  of  continuous  operation  is 
application  dependent;  it  is  related  to  critical  services  that  the  system  provides.  For  example,  check 
clearing  is  a  critical  service  of  a  banking  system,  and  a  survivable  banking  system  will  continue  providing 
this  service  despite  the  presence  of  faults. 

In  this  paper  we  present  a  method  for  analyzing  a  networked  system  for  survivability.  A 
networked  system  consists  of  nodes  and  links  connecting  the  nodes.  Communication  between  the  nodes 
occurs  by  passing  messages  over  the  links.  An  event  in  the  system  can  be  either  a  user  event  (e.g.,  a  user 
issues  a  check),  an  internal  event  (e.g.,  a  user's  account  is  debited),  a  communication  event  (e.g.,  sending 
a  message  between  two  banks),  or  a  fault  (e.g.,  a  bank  under  a  malicious  attack).  A  service  is  associated 
with  a  start  event  (e.g.,  a  user  issues  a  check)  and  wend  event  (e.g.,  the  check  clears).  The  start  event 
and  the  end  event  correspond  respectively  to  when  "a  service  is  issued"  and  when  a  "service  is  finished." 

Our  main  goal  is  to  provide  information  to  the  system  architect  during  the  design  phase,  the 
early  planning  stage  of  the  software  lifecycle.  With  this  information,  the  architect  can  weigh  the  pros 
and  cons  of  decisions  related  to  survivability.  The  method  we  present  in  this  paper,  however,  is  just  as 
suitable  for  post  facto  analysis  of  existing  systems. 

Our  method  is  general  enough  to  support  many  different  types  of  analysis.  In  this  paper  we  focus 
on  three  specific  kinds  of  questions. 

Question  1:  What  is  the  effect  of  a  fault? 

Example:  Imagine  an  architect  is  designing  a  power  grid.  He  wants  to  know  the  effect  of  an  outage  of  a 
power  plant  located  in  upstate  New  York  on  customers  living  hundreds  of  miles  away  in  western 
Pennsylvania. 

Answer  (Fault-Effect  Analysis):  Using  our  method  the  architect  can  visualize  the  global  effect  of  a 
local  fault  through  a  data  structure  that  we  call  a  scenario  graph.  In  our  method,  we  automatically 
generate  scenario  graphs  using  model  checking. 

Question  2:  What  is  the  reliability  and  latency  of  a  service?  Here,  reliability  is  defined  as  the 
probability  that  a  service  that  has  been  issued  will  finish.  Latency  measures  the  expected  time  it  takes  a 
service  to  finish. 

Example:  Suppose  an  architect  designing  a  banking  system  wants  to  find  out  the  probability  that  a  check 
issued  actually  clears. 

Answer  (Reliability  and  Latency  Analysis):  To  find  the  reliability  of  the  banking  system  with  respect 
to  the  check  clearing  service,  we  query  an  annotated  scenario  graph.  The  architect  first  identifies  a  set  of 
"critical"  elements  in  the  network,  i.e.,  nodes  and  links  whose  failures  would  have  a  severe  effect  on  the 
provision  of  the  service  in  question.  He  then  assigns  probabilities  to  each  fault  (i.e.,  the  failure  of  each 
node  or  link).  Then,  using  our  method,  he  can  automatically  compute  both  the  reliability  and  latency 
of  the  network. 

Question  3:  Given  cost  constraints,  which  network  nodes/links  should  be  upgraded  to  maximize  benefit 
(e.g.,  reliability)? 
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Example:  Suppose  an  architect  is  allowed  to  spend  newly  allocated  funds  to  upgrade  a  fraction  of  the 
network's  links  to  newer  links  that  are  faster  and  more  reliable.  Given  the  constraints  imposed  by  his 
manager's  limited  budget,  which  links  should  he  choose  to  upgrade  to  maximize  the  network's  reliability? 

Answer  (Cost-Benefit  Analysis):  To  perform  a  cost-benefit  analysis,  we  further  extend  our  annotated 
scenario  graphs  with  additional  cost  information  related  to  upgrading  the  links.  We  then  can 
automatically  compute  how  to  maximize  a  given  benefit  given  a  set  of  cost  constraints. 

Survivability  analysis  is  fundamentally  different  from  analysis  of  properties  found  in  other  areas 
(e.g.,  algorithm  analysis  of  fault-tolerant  distributed  systems,  reliability  analysis  of  hardware  systems,  and 
"security"  analysis  of  computer  systems).  First,  survivability  analysis  must  handle  a  broader  range  of 
faults  than  any  of  these  other  areas;  we  must  minimally  handle  both  accidental  failures  and  malicious 
attacks.  To  achieve  this  goal  our  method  allows  an  architect  to  incorporate  any  arbitrary  type  of  fault  in 
the  system  model;  however,  we  still  allow  distinctions  among  faults  by  assigning  different  weights 
(e.g.,  probability  of  occurring,  cost  to  repair,  etc.)  to  each  fault. 

Second,  events  may  be  dependent  on  each  other,  especially  fault  events.  In  contrast,  for  ease  of 
analysis,  most  work  in  the  fault-tolerant  literature  makes  the  independence  assumption:  assume  that 
abnormal  events  are  independent.  We  cannot  make  this  assumption  in  analyzing  systems  for 
survivability.  For  example,  if  a  server  crashes,  then  it  is  easier  for  a  malicious  intruder  to  spoof  the  crashed 
server;  the  chance  that  an  intruder  will  succeed  in  spoofing  a  server  depends  on  the  event  that  the  server 
crashes.  Or,  if  an  attacker  learns  how  to  compromise  one  disk  of  a  replicated  server,  then  he  can  easily 
compromise  the  replicas  too;  the  chance  of  bringing  down  an  entire  service  depends  on  the  likelihood  of 
success  of  the  original  attack.  In  our  method  we  allow  users  to  express  such  dependencies.  Representing 
dependence  between  events  allows  us  to  model  phenomena  such  as  correlated  attacks,  where  local  attacks 
might  not  succeed,  but  when  they  occur  in  tandem  or  in  succession  they  can  have  a  severe  effect  on  the 
system.  Distributed  denial-of-service  attacks  is  an  example  of  a  correlated  attack  (see  CERT  advisory  CA- 
2000-0).  Representing  dependence  also  allows  us  to  handle  cascading  effects,  where  one  fault  triggers 
another,  which  then  triggers  another,  and  so  on.  While  it  is  cleaner  to  design  a  system  to  avoid  cascading 
effects  (e.g.,  by  using  a  strict  locking  protocol  to  avoid  cascading  aborts  in  a  transactional  database), 
in  practice  it  may  be  impossible  to  anticipate  faults  induced  by  a  system's  environment  that  violates  the 
assumptions  made  by  the  system's  original  designer.  Since  survivability  is  of  particular  concern  to  those 
building  systems  of  systems,  system  architects  will  have  to  face  the  possibility  of  cascading  effects  in  their 
analysis. 


Third,  survivability  analysis  should  also  be  service  dependent.  For  example,  the  architect  for  a 
banking  system  might  choose  to  focus  on  the  check  clearing  service  as  being  critical,  although  the  banking 
system  provides  other  services  such  as  accounting,  auditing,  and  cash  distribution;  for  a  different 
analysis,  cash  distribution  might  be  the  critical  service  to  focus  on.  Taking  into  consideration  the 
specific  service  a  system  is  to  provide  enables  more  targeted  analysis,  which  is  often  amenable  to  fully 
automated  support.  Also  a  method  that  focuses  the  architect's  attention  on  specific  services  rather  than  the 
general  system  design  is  likely  to  be  more  appreciated  and  better  understood  by  the  end  customer 
(who  cares  about  the  reliability  of  the  applications'  services).  The  analyses  in  our  method  are  all  driven  by 
the  properties  that  the  architect  specifies  as  they  relate  to  a  critical  service. 

Finally,  survivability  analysis  deals  with  multiple  dimensions.  It  simultaneously  deals  with 
functional  correctness  (modeling  the  service  itself),  fault-tolerance  (modeling  the  effects  of  accidental 
failures),  security  (modeling  the  effects  of  malicious  attacks),  reliability  (the  likelihood  of  a  service 
finishing),  performance  (network  latency),  and  cost.  To  achieve  this  goal,  the  analytical  approach 
described  in  this  paper  combines  several  different  kind  of  analysis  techniques  into  one  framework. 

The  next  section  introduces  constrained  Markov  Decision  Processes  which  form  the  basis  for 
reliability,  latency,  and  cost-benefit  analysis.  A  general  overview  of  our  method  appears  in  Section  3. 
We  describe  a  small  example  based  on  the  United  States  Payment  System  in  Section  4,  which  we 
use  as  a  running  example  throughout  the  remainder  of  the  paper.  Section  5  provides  additional  details 
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related  to  each  step  in  our  method.  Section  6  briefly  describes  a  prototype  tool  Trishul  that  we  have 
implemented  based  on  our  method,  and  briefly  describes  two  case  studies  that  we  have  performed. 
Sections  7  and  8  discuss  related  work  and  conclusions  respectively. 


2  Model  of  Computation 

Our  formal  model  is  based  on  constrained  Markov  Decision  Processes  or  simply  CMDPs. 
CMPDs  are  a  generalization  of  Markov  chains,  where  the  transition  probabilities  depend  on  the  past 
history.  CMDPs  enable  us  to  model  history  dependent  transition  probabilities  and  provide  a  framework  to 
perform  cost-benefit  analysis.  Our  exposition  of  CMDPs  is  based  on  Altman  [2].  A  CMDP  is  5-tuple  (S,  A, 
P,  c,  d)  where: 

•  5  is  a  finite  state  space. 

•  ^  is  a  finite  set  of  actions.  For  a  state  s  C  S,  A(s)  C  A  is  the  set  of  actions  available  at  state 

•  P  are  transition  probabilities,  where  Psas’  is  the  probability  of  moving  from  state  ^  to  s' '  if  action  a  is 
chosen. 

•  c  .'  (S  V  >  P  is  the  immediate  cost,  i.e.,  c(s,  a)  denotes  the  cost  of  choosing  action  a  at  state  s. 
This  cost  will  be  related  to  the  value  function  to  be  minimized. 

•  d  :  (S  X  A)  — is  a  k-dimensional  vector  of  immediate  costs.  This  will  be  related  to  cost 
constraints. 

A  Markov  Decision  Process  (MDP)  is  a  CMDP  without  the  last  component  d. 

Flistory  at  time  t  (denoted  by  hj  is  the  sequence  of  states  encountered  and  actions  taken  up  to 
time  t.  A  policy  u  takes  into  account  the  history  /?,and  determines  the  next  action  at  time  t.  Specifically, 
u,(a\ht)  the  probability  of  taking  action  a  given  history /?,.  A  policy  m  defines  a  va/we/Mnction  V''\S 
— >  R,  where  V'‘(s)  is  the  expected  cost  of  the  actions  taken  if  the  CMPD  uses  policy  u  and  starts  in  state  s' 
(the  cost  c  is  used  to  define  expected  cost).  The  technical  definition  of  F  “  can  be  found  in  [2]. 
Analogously,  starting  in  state  s  let  the  expected  value  of  the  immediate  costs  d  under  the  policy  u  be 
denoted  by  D‘‘(s).  Since  the  result  of  d  is  a  k-dimensional  vector,  D'‘(s)  is  also  a  k-dimensional  vector  of  real 
numbers.  Assume  that  we  are  also  given  a  k-dimensional  vector  C  =  (cl,  ....  ck),  where  d  is  the  cost 
constraint  on  the  i-th  component  of  D“(s).  Our  aim  is  to  find  a  policy  that  minimizes  the  value 
function  V  “  given  the  constraint  imposed  by  the  vector  C,  or 

Given  an  initial  state  So  €  S,  find  a  policy  u  that  minimizes  V'‘(so)  subject  to  D^fSo)  <  C. 

Remark:  Do  not  confuse  a  Markov  process  with  a  Markov  policy,  which  is  a  policy  where  the  probability  of 
an  action  depends  only  on  the  current  state  of  the  CMDP  and  not  the  entire  history. 


Example  2.1  Imagine  a  bakery  where  there  can  be  at  most  10  customers  waiting  at  any  time.  At  each 
time  the  bakery  manager  has  the  option  of  having  one  or  two  servers  behind  the  counter.  The  state  of  the 
CMDP  corresponds  to  the  number  of  servers  behind  the  counter  and  the  number  of  customers  waiting.  The 
action  at  each  state  is  to  decide  on  how  many  servers  should  be  behind  the  counter.  In  Figure  1  we  show 
a  few  transitions.  Consider  the  transition  from  state  (S=l,  C=m)  to  (S=2,  C=m-1).  The  action  label  a 
=  2  on  the  transition  indicates  that  the  manager  decided  to  switch  to  two  servers  behind  the  counter.  The 
probability  that  a  waiting  customer  leaves  with  his/her  order  is  0.5  or  0.75  depending  on  whether  there  are 
one  or  two  servers  behind  the  counter.  Notice  that  the  probability  that  a  customer  gets  serviced  is  higher 
when  there  are  two  servers  behind  the  counter.  Therefore,  the  transition  from  state  (S=l,  C=m)  to  (S=2, 
C=m-1)  has  probability  0.75.  The  rest  of  the  transitions  have  a  similar  explanation.  Given  a  state  and  an 
action,  the  probability  that  a  customer  is  serviced  in  the  next  time  period  determines  the  cost  function  c. 
For  example,  the  cost  of  the  state  action  pair  h  (S=l,  C=m),  a=l  i  is  -0.5  because  if  an  action  a=l 
is  chosen  from  the  state  the  expected  number  of  customers  that  are  serviced  during  the  next  time  step  is 
0.5.  Notice  that  the  negative  of  the  cost  determines  the  throughput,  i.e.,  the  expected  number  of 
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customers  that  are  serviced  in  the  next  time  period.  The  number  of  servers  behind  the  counter 
determines  the  cost  function  d,  i.e.,  two  servers  cost  more  than  one.  The  aim  of  the  manager  is  to 
maximize  expected  throughput  (or  minimize  expected  cost  related  to  c)  given  a  constraint  on  the  wages  of 
the  servers.  Achieving  this  goal  can  be  easily  seen  as  a  problem  of  value  maximization  under  cost 
constraints  and  naturally  fits  the  CMDP  framework.  The  optimal  policy  for  this  CMDP  will  indicate  to 
the  bakery  manager  when  to  change  the  number  of  servers  behind  the  counter. 


S;  number  of  servei>i 
C:  number  ot'waitinji  customers 


Figure  1:  A  Bakery 


3  The  General  Method 

In  this  section  we  provide  a  brief  overview  of  our  method;  Section  5  gives  more  details  about  the 
techniques  we  use  and  our  implementation.  In  steps  1,  2,  and  3  we  model  the  network,  inject  faults  into  our 
model,  and  specify  survivability  related  properties.  Then  in  steps  4,  5,  and  6  we  analyze  the  effects  of 
faults,  perform  reliability  and  latency  analysis,  and  do  cost-benefit  analysis  to  parallel  answering  the 
three  kinds  of  questions  posed  in  the  introduction. 

3.1  Step  1:  Model  the  Network 

First,  the  architect  models  a  networked  system,  which  can  be  done  using  one  of  many 
formalisms.  We  choose  to  use  state  machines  and  we  use  them  to  model  both  network  nodes  and 
links.  We  use  shared  variables  to  represent  communication  between  the  state  machines. 


3.2  Step  2:  Inject  Faults 

Both  links  and  nodes  may  be  faulty.  With  our  state  machine  model  of  the  networked 
system,  we  need  not  make  a  distinction  between  nodes  and  links  when  considering  faults.  That  is,  a 
link  is  simply  a  node  that  passes  data  between  two  other  nodes.  Injecting  a  fault  then  requires  first 
representing  that  a  fault  has  occurred  and  then  determining  the  behavior  of  the  faulty  node  for  each  kind 
of  fault  that  may  occur.  The  exact  behavior  of  a  faulty  node,  specified  by  the  architect,  depends  on  the 
application. 

To  represent  faults  in  our  method,  for  each  state  machine  representing  a  node,  we  introduce 
a  special  variable  called  fault,  which  can  range  over  a  userspecified  set  of  symbolic  values.  For  example, 
the  following  declaration  states  that  there  are  three  modes  of  operation  for  a  node,  representing  whether  it 


50 


is  in  the  normal  mode  of  operation,  failed,  or  compromised  by  an  intruder. 

fault:  {  normal,  failed,  intruded } 


Given  this  simple  representation,  we  can  then  choose  to  specify  the  precise  behavior  of  the 
node  in  each  mode  of  operation.  For  example,  for  any  given  state  we  can  specify  that  the  machine 
makes  a  transition  from  the  normal  mode  of  operation  to  one  of  the  abnormal  modes  (failed  or  intruded) 
and  further  specify  what  state  the  machine  is  in  once  such  a  transition  occurs.  We  also  have  the  option  of 
leaving  state  transitions  completely  nondeterministic. 

3.3  Step  3:  Specify  Survivability  Properties 

The  architect  specifies  properties  related  to  survivability  using  some  kind  of  formal  logic.  In  our 
method,  we  use  a  temporal  logic  called  Computation  Tree  Logic  (CTL),  but  other  temporal  logics  such 
as  Linear  Time  Logic  [15]  would  also  be  appropriate. 

In  this  paper,  we  focus  on  two  classes  of  survivability  properties:  fault  and  service  related.  The 
first  class  captures  properties  of  the  networked  system  under  scrutiny  when  it  enters  a  faulty  state.  The 
second  class  captures  properties  specific  to  the  system's  services. 

3.4  Step  4:  Generate  Scenario  Graphs 

Given  a  state  machine  model,  M,  of  the  networked  system  (with  injected  faults)  and  a 
survivability  property,  P,  we  then  generate  a  scenario  graph,  which  is  a  concise  representation  of  a 
set  of  traces  of  M  with  respect  to  P.  For  fault  properties,  a  fault  scenario  graph  represents  all 
system  traces  that  end  in  a  faulty  state;  for  service  properties,  a  service  success  (fail)  scenario  graph 
represents  all  system  traces  in  which  an  issued  service  successfully  finishes  (fails  to  finish).  An 
architect  can  use  scenario  graphs  to  visualize  the  effects  of  injected  faults  on  a  certain  service. 
(In  the  operational  security  literature,  scenario  graphs  are  similar  to  attack  state  graphs  [  1 3] .) 

3. 5  Step  5:  Reliability  and  Latency  Analysis 

Once  we  have  a  scenario  graph,  we  can  perform  further  analyses,  such  as  reliability  and 
latency  analysis.  First,  the  architect  specifies  the  probabilities  of  certain  events  of  interest,  such 
as  faults,  in  the  system.  Since  we  do  not  assume  independence  of  events,  we  use  a  formalism 
based  on  Bayesian  networks  [14]  to  specify  the  conditional  probabilities  of  the  events.  We  combine 
the  specified  probabilities  with  the  scenario  graph  to  obtain  an  MDP.  We  can  then  readily 
compute  reliability  and  latency  by  solving  for  optimal  policies  using  the  relevant  cost  functions  c, 
i.e.,  for  reliability  analysis  the  cost  function  is  identically  zero;  for  latency  analysis,  it  is  a  function 
of  the  times  associated  with  making  state  transitions. 

An  advantage  of  our  method  is  that  an  architect  need  not  specify  probabilities  for  all 
events;  an  MDP  can  have  both  probabilistic  and  nondeterministic  transitions. 

3. 6  Step  6:  Cost-Benefit  Analysis 

In  this  step  we  transform  the  MDP  from  Step  5  into  a  CMDP.  First  we  enhance  the 
MDP's  set  of  actions  A  with  actions  corresponding  to  decisions  that  an  architect  has  to  make. 
For  example,  these  additional  actions  might  correspond  to  upgrading  links  to  produce  a  more 
reliable/faster  system,  and  the  architect  must  decide  which  links  to  upgrade.  Each  added  action 
has  a  cost;  the  architect  wants  to  simultaneously  minimize  cost  and  maximize  some  benefit  (e.g., 
reliability).  Thus,  we  also  associate  costs  with  these  actions  and  provide  constraints  on  these  costs 
(i.e.,  specify  the  function  d  in  the  definition  of  CMDPs).  The  optimal  policy  corresponding  to  the 
CMDP  so  constructed  provides  the  architect  with  the  optimal  decision  under  the  specified  cost 
constraints. 
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4  Example 

We  consider  a  simplified  model  of  the  United  States  Payment  System,  depicted  in  Figure  2. 
There  are  three  levels  of  institutions:  Federal  Reserve  Banks  at  the  top,  money  centers  in  the 
middle,  and  small  banks  at  the  bottom.  If  banks  are  connected  to  the  same  money  center,  then 
transactions  between  them  are  handled  by  the  money  center;  there  is  no  need  to  go  through  the 
Federal  Reserve  Banks.  For  a  detailed  description  of  the  system  see  [11]. 

To  illustrate  the  architecture,  suppose  a  customer  A  writes  a  850  check  to  customer  C  so  that  the 
check  has  a  source  address  Bank-A  and  destination  address  Bank-C.  The  following  steps  occur  for  the 
issued  check  to  clear: 

1 .  Bank-A  and  Bank-C  are  not  connected  through  a  money  center,  so  the  check  is  then  sent  to  a  money 
center  connected  to  Bank-A.  In  this  case,  let's  choose  money  center  MC-1. 

2.  The  check  is  then  transferred  to  the  Federal  Reserve  Bank  closest  to  MC-1,  in  this  case  FRB-2. 

3.  The  check  is  then  transferred  to  the  Federal  Reserve  Bank  that  has  jurisdiction  over  Bank-C,  in  this  case 
FRB-3. 

4.  The  check  finally  makes  it  way  to  Bank-C  through  the  money  center  MC-3. 

In  Figure  2  the  path  of  the  check  is  shown  using  dot-dashed  lines. 


5  Detailed  Description 

We  now  present  the  details  of  each  step  in  our  method  in  more  detail,  illustrating  them  with  the 
check  clearing  example. 

5.1  Step  1:  Model  the  Network 

We  model  each  node  and  link  in  the  system  as  a  finite  state  machine,  and  the  entire 
networked  system  as  the  composition  of  these  machines.  In  our  implementation,  we  use  the  model 
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checker  NuSMV  [1],  and  hence  we  use  NuSMV's  input  language  to  describe  the  state  machines 
representing  a  given  system.  Using  this  off-the-shelf  model  checker  makes  it  convenient  for  us  at 
later  steps  in  our  method  to  perform  further  global  analyses;  NuSMV's  output  lets  us  automatically 
derive  information  that  we  would  otherwise  have  to  reconstruct. 

In  our  banking  example,  we  use  state  machines  to  model  the  banks,  the  money  centers, 
the  Federal  Reserve  Banks,  and  the  links.  Each  element  in  the  banking  infrastructure  corresponds 
to  a  MODULE  description  in  NuSMV  and  communication  is  achieved  by  parameter  passing.  We 
make  some  simplifying  assumptions  in  the  model  of  our  system:  (1)  There  is  just  one  user  who 
issues  checks;  the  source  and  destination  address  of  these  checks  are  decided  nondeterministically, 
i.e.,  the  source  address  can  be  banks  A,  B,  or  C,  and  similarly  for  the  destination;  (2)  There  is 
only  one  check  active  at  any  time,  and  the  exact  amount  of  the  check  is  irrelevant. 

5.2  Step  2:  Inject  Faults 

Next  we  inject  faults  in  our  model  by  including  a  special  state  variable  (fault)  with  each 
state  machine  to  indicate  the  mode  of  operation.  We  modify  the  specification  of  each  state  machine  to 
take  into  consideration  its  faulty  modes  of  operation. 

In  our  banking  example,  what  faults  we  inject  and  how  we  handle  them  in  our  model  are 
based  on  the  following  assumptions: 

•  The  only  network  elements  that  can  be  faulty  are  (1)  links  between  the  banks  and  the  money 
centers;  and  (2)  small  banks,  representing  that  penetration  by  a  malicious  intruder  has  occurred 
(i.e.,  fault  =  iutruded).  No  other  links  or  institutions  may  become  faulty  and  banks  cannot  fail 
accidentally. 

•  When  a  link  is  faulty,  it  blocks  all  messages  and  consequently  no  message  ever  reaches  the 
recipient. 

•  Links  may  become  faulty  at  any  time.  Thus,  in  our  finite  state  machine  model  of  a  link,  we 
allow  a  nondeterministic  transition  to  the  state  where  fault  is  equal  to  failed.  The  third  value 
intruded  for  the  variable  fault  is  not  used  in  this  case. 

•  Banks  can  sense  a  faulty  link  and  route  the  checks  accordingly. 

These  assumptions  show  how  we  take  into  consideration  the  semantics  of  the  application; 
e.g.,  we  are  implicitly  assuming  that  Federal  Reserve  Banks  are  impenetrable  and  links  between 
them  are  highly  reliable  and  secure. 

Our  model  reflects  the  following  behavior.  Under  the  normal  mode  of  operation,  a  bank 
receives  a  check  (nondeterministically  issued  by  the  user-  with  its  source  address.  Depending  on  the 
destination  address  of  the  issued  check,  the  bank  either  clears  it  locally  or  routes  it  to  the 
appropriate  money  center.  For  example,  if  a  check  with  source  address  A  and  destination  address  B  is 
issued,  then  it  is  sent  to  the  money  center  MC-1  and  then  sent  to  bank  B.  On  the  other  hand,  a  check  with 
source  address  A  and  destination  address  C  has  to  clear  through  the  Federal  Reserve  Banks  (as  in  Figure  2). 
If  a  bank  is  faulty,  then  checks  are  routed  arbitrarily  by  the  intruder  (thereby  ignoring  the  check's 
destination  address).  A  bank  can  then  at  any  time  nondeterministically  transition  from  the  normal  mode 
(fault=normal)  to  the  intmded  mode  (fault=intruded).  Once  the  bank  is  faulty  it  stays  in  that  state  forever. 

The  precise  behavior  of  a  faulty  node  depends  on  the  application,  but  two  types  of  behaviors  under 
failure  conditions  are  common.  In  the  case  of  a  stuck-at  fault  the  node  becomes  stuck,  i.e.,  it  accepts  no 
input  on  its  channel  and  consequently  produces  no  output.  A  node  with  a  Byzantine  fault  exhibits 
completely  nondeterministic  behavior,  i.e.,  accepts  any  inputs  and  produces  arbitrary  outputs.  A  Byzantine 
fault  can  also  be  used  to  model  an  intruded  node. 
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5.3  Step  3:  Specify  Survivability  Properties 

In  this  step,  we  speeify  survivability  properties  in  CTL,  a  logie  ehosen  for  eonvenienee  sinee  the 
model  eheeker  we  use  aeeepts  CTL  speeifieations.  Although  CTL  is  a  rieh  logie  and  allows  us  to  express  a 
variety  of  properties,  we  focus  on  two  classes  of  survivability  properties:  fault  and  service  related. 

Fault  Related  Properties 

Suppose  we  want  to  express  the  property  that  it  is  not  possible  for  a  node  N  to  reach  a  certain 
unsafe  state  if  the  network  starts  from  one  of  the  initial  states.  The  precise  semantics  of  an  unsafe  state 
depends  on  the  application.  Let  the  atomic  proposition  unsafe  represent  the  property  that  node  N  is  in  an 
unsafe  state.  We  can  then  express  the  desired  property  in  CTL  as  follows: 


AG(~^unsafe) 

which  says  that  for  all  states  reachable  from  the  set  of  initial  states  it  is  true  that  we  never  reach  a  state 
where  unsafe  is  true.  The  negation  of  the  property  is 

E¥  (-unsafe) 

which  is  true  if  there  exists  a  state  reachable  from  the  initial  state  where  unsafe  is  true;  in  other  words  if  the 
network  starts  in  one  of  the  initial  states  it  is  possible  to  reach  an  unsafe  state.  The  atomic  proposition 
unsafe  can  stand  for  a  property  as  complex  as  we  desire.  It  could  mean  that  a  certain  critical  node  has 
entered  an  undesirable  state  (e.g.,  a  critical  valve  is  open  in  a  nuclear  power  plant),  or  it  could  mean  that  a 
certain  unauthorized  operation  occurred  at  a  critical  node.  For  example,  if  a  node  represents  a  computer 
protecting  a  critical  resource,  it  could  represent  the  fact  that  somebody  without  the  appropriate  authority 
has  logged  onto  the  computer.  The  precise  nature  of  a  faulty  state  depends  on  the  example  at  hand. 

Service  Related  Properties 

Many  networked  systems  are  built  for  distributed  applications.  For  these  cases  we  want  to  make 
sure  that  if  a  node  N  issues  a  service,  then  the  service  eventually  finishes  executing.  Let  the  atomic 
proposition  start  express  that  a  service  was  started,  and  finished  express  that  the  transaction  is  finished. 
The  temporal  logic  formula  given  below  expresses  that  for  all  states  where  a  service  starts  and  all 
paths  starting  from  that  state  there  exists  a  state  where  the  service  always  finishes,  or  in  other  words 
a  service  issued  always  eventually  finishes. 

AG(start  AF  (finished  )) 

For  the  banking  example,  we  would  like  to  verify  that  a  check  issued  is  always  eventually  cleared.  This  can 
be  expressed  in  CTL  as: 


AG(checkIssued  — >  AF(checkCleared)) 

We  can  also  analyze  the  effect  of  a  compromised  node  (say  N).  Suppose  we  have  modeled  the  effect 
of  a  malicious  attack  on  node  N  (see  discussion  on  injecting  faults).  Now  we  can  check  whether  the 
desired  properties  are  true  in  the  modified  networked  system.  If  the  property  turns  out  to  be  true,  the 
network  is  resistant  to  the  malicious  attack  on  the  node  N.  This  type  of  analysis  is  useful  in  determining 
vulnerable  or  critical  nodes  of  a  network  with  respect  to  a  certain  service.  Using  this  analysis,  if  a  node 
is  found  to  be  vulnerable  or  critical  for  a  given  service  to  complete,  then  the  system  administrator  can 
deploy  sophisticated  intrusion  detection  algorithms  for  that  node  or  bolster  the  security  infrastructure 
around  it.  Thus  our  analysis  can  help  identify  the  critical  nodes  in  a  networked  system  and  therefore  help 
determine  whether  it  is  survivable  with  respect  to  desired  properties  of  a  given  service. 
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5.4  Step  4:  Generate  Scenario  Graphs 


We  automatically  construct  scenario  graphs  via  model  checking.  When  a  specified  property  is 
not  true  in  a  given  model,  a  model  checker  will  produce  a  counterexample,  i.e.,  a  trace  or  a  scenario 
that  leads  to  a  final  state  that  does  not  satisfy  the  property.  (Details  of  model  checking,  e.g.,  see  [5],  are 
not  needed  to  understand  our  method.)  We  exploit  this  functionality  of  model  checkers  to  generate  scenario 
graphs;  i.e.,  a  scenario  graph  is  a  compact  representation  of  all  the  traces  that  are  counterexamples  of  a 
given  property  *.  For  example,  suppose  we  want  to  check  whether  during  the  execution  of  a  networked 
system  a  certain  event  (e.g.,  buffer  overflow-  never  happens.  If  the  property  is  not  true  (i.e.,  buffer  overflow 
can  happen),  the  scenario  graph  encapsulates  all  sequences  of  states  and  transitions  that  lead  the  system  to  a 
state  where  a  buffer  overflow  occurs. 

Scenario  graphs  depict  ways  in  which  a  network  can  enter  an  unsafe  state  or  ways  in  which  a 
service  can  fail  to  finish.  Scenario  graphs  encapsulate  the  effect  of  local  faults  on  the  global  behavior 
of  the  network.  If  the  architect  models  malicious  attacks,  the  scenario  graph  is  a  compact  representation 
of  all  the  threat  scenarios  of  the  network,  i.e.,  a  set  of  sequences  of  intruder  actions  that  lead  the  network 
to  an  unsafe  state. 


Fault  Scenario  Graphs 

Recall  that  we  can  express  the  property  of  the  absence  of  an  unsafe  reachable  state  as: 
AG(~^unsafe) 

If  this  formula  is  not  true,  it  means  that  there  are  states  that  are  reachable  from  the  initial  state 
that  are  faulty. 


We  briefly  describe  the  construction  of  a  scenario  graph.  Assume  that  we  are  trying  to  verify  using 
model  checking  whether  the  specification  of  the  network  satisfies  AGi^unsafe).  Usually,  the  first  step  in 
model  checking  is  to  determine  the  set  of  states  Sr  that  are  reachable  from  the  initial  state.  After  having 
determined  the  set  of  reachable  states,  the  algorithm  determines  the  set  of  reachable  states  Sumife  that  have 
a  path  to  an  unsafe  state.  The  set  of  states  is  computed  using  fix-point  equations  [5].  LetR  be  the 
transition  relation  of  the  network,  i.e.,  (s,  s)  C  R  iff  there  is  a  transition  from  state  ^  to  s'  in  the  network. 
By  restricting  the  domain  and  range  of  R  to  smsafe'^^  obtain  a  transition  relation  Ry  that  encapsulates  the 
edges  of  the  scenario  graph.  Therefore,  the  scenario  graph  is  G  =  where  and  R/ represent 

the  nodes  and  edges  of  the  graph  respectively.  In  symbolic  model  checkers,  like  NuSMV,  the  transition  relation 
and  sets  of  states  are  represented  using  binary  decision  diagrams  (BDDs)  [4],  a  compact  representation 
for  boolean  functions.  All  the  operations  described  above  can  be  easily  performed  using  BDDs.  The 
BDD  for  the  transition  relation  Rf  is  a  succinct  representation  of  the  edges  of  the  fault  scenario  graph. 
Since  BDDs  are  capable  of  representing  a  large  number  of  nodes,  very  large  scenario  graphs  can  be 
computed  using  our  method. 

Service  Success/Fail  Scenario  Graphs 

In  the  case  of  services,  we  are  interested  in  verifying  that  every  service  started  always  eventually 
finishes.  Recall  that  we  express  this  property  in  CTL  as: 

AG(start  AF (finished  )) 

Since  we  allow  several  nodes  to  be  faulty,  in  our  experience  we  find  that  most  of  the  time  this 
property  fails  to  hold.  Thus  more  interestingly,  during  the  model  checking  procedure,  we  derive  two 
graphs:  a  service  success  scenario  graph  and  a  service  fail  scenario  graph.  The  success  scenario  graph 
captures  all  the  traces  in  which  the  service  finishes;  the  fail  scenario  graph,  all  the  traces  in  which  the  service 
fails  to  finish.  These  scenario  graphs  are  constructed  using  a  procedure  similar  to  the  one  described  for  the 
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fault  scenario  graphs. 

In  our  banking  example,  issuing  a  check  corresponds  to  the  start  of  a  service.  The  scenario  graph 
shown  in  Figure  3  shows  the  effect  of  link  failures  on  the  check  clearing  service  for  a  check  issued  with 
source  address  Bank-A  and  destination  address  Bank-C  (the  start  event  is  labeled  as  issueCheck(Bank- 
A,BankC)  in  the  figure).  The  event  corresponding  to  sending  a  check  from  location  LI  to  L2  is  denoted  as 
sendCheck(Ll,L2).  The  predicates  up(Link-A-2)  and  down(Link-A-2)  indicate  whether  Link-A-2  is  up  or 
down.  Recall  that  we  allow  links  to  fail  nondeterministically.  Therefore,  an  event  sendCheck(Bank-A,MC2) 
is  performed  only  if  Link-A-2  is  up,  i.e.,  up(Link-A-2)  is  the  pre-condition  for  event  sendCheck(Bank- 
A,MC-2).  If  a  pre-condition  is  not  shown,  it  is  assumed  to  be  true.  Note  that  a  fault  in  a  link  can  also  be 
construed  as  an  intruder  taking  over  the  link  and  shutting  it  down.  From  the  graph  it  is  easy  to  see  that  a 
check  clears  if  Link-A-2  and  Link-C-3  are  up,  or  if  Link-A-2  is  down  and  Link- A- 1  and  Link-C-3  are  up.  We 
modified  the  model  checker NuSMV  to  produce  such  scenario  graphs  automatically. 

For  realistic  examples  scenario  graphs  can  be  extremely  large.  Therefore,  it  is  not  feasible  to 
enumerate  all  the  scenarios  or  traces  corresponding  to  a  scenario  graph.  We  developed  a  querying 
process  by  which  an  architect  can  select  a  subset  of  scenarios.  First  an  architect  identifies  events  of 
interest  in  the  network;  then,  using  these  events  as  alphabet  symbols,  the  architect  provides  a  regular 
expression  to  specify  the  traces  of  interest.  Consider  the  scenario  graph  shown  in  Figure  3  and  this  regular 
expression  for  the  alphabet  X: 

Z*  sendCheck(FRB-2,FRB-3)  Z* 

This  query  captures  the  architect's  interest  in  all  traces  where  the  check  is  transferred  from  FRB-2  to 
FRB-3,  as  denoted  by  the  event  sendCheck(FRB2,  FRB-3).  A  trace  that  satisfies  the  regular  expression 
is  shown  by  a  dotted  line  in  Figure  3. 

5.5  Step  5:  Reliability  and  Latency  Analysis 

Once  we  have  generated  scenario  graphs,  we  can  perform  reliability  and  latency  analysis.  First, 
we  need  to  incorporate  probabilities  of  various  events  into  a  given  scenario  graph  to  produce  an 
MDP;  then  using  the  MDP  we  compute  reliability  and  latency  by  calculating  the  value  function 
corresponding  to  the  optimal  policy. 

We  first  explain  this  analysis  using  the  banking  example  and  then  provide  a  formal  explanation. 
Let  the  boolean  state  variable  A1  indicate  whether  Link-Al  is  up,  so  A1  corresponds  to  Link-A-l's  being 
down.  Analogously,  A2  and  C3  are  the  boolean  variables  corresponding  to  Link-A-2's  and  Link-C-3 's  being 
up.  In  general  an  event  will  be  associated  with  a  boolean  variable  and  the  negation  of  the  variable  will 
denote  that  the  event  did  not  occur;  we  will  use  the  boolean  variable  and  the  event  it  represents 
synonymously,  e.g.,  event  A1  corresponds  to  Link-A-l's  being  up. 

We  now  explain  how  we  handle  dependencies  between  events.  Assume  that  event  A2  is  dependent 
on  A1  and  there  are  no  other  dependencies.  Let  P  (Al)  and  P  (C3)  both  he  a  a  where  P  (Al)  and  P 
(C3)  are  the  probabilities  of  Link-A-1  and  Link-C-3  being  up.  The  probability  of  event  A2  depends  on  the 
event  Al,  and  we  give  its  conditional  probability  as: 

P(A2\A})  =  1/2 

P(A2  \A1)  =  1/4 
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Figure  3:  A  Simple  Scenario  Graph 


Reflect  that  if  Link  A-1  is  down,  it  is  more  likely  that  Link  A-2  will  go  down.  In  general,  if  an 
event  A  depends  on  the  set  of  events  {Aj,  Ai^},  then  the  probability  of  A  has  to  specified  for  each 
possible  case  in  the  set  of  events  {Aj,  ...  ,  At}.  For  example,  if  A  depends  on  (Aj  A2},  then  P(A\AiA 
Aj),  PfAjAjA  Aj),  P(A\  a  I  a  Aj),  P  (A\AiAAj),  and  P(A\  A  jAAj)  have  to  be  specified.  This  technique 
is  the  Bayesian  network  formalism. 

In  our  example,  first  we  have  to  compute  the  probability  of  the  two  events  A2  and  A2  A  Al. 
These  events  correspond  to  events  up(Link-A-2)  and  down(LinkA-2)  &  up(Link-A-l)  in  the  scenario 
graph.  The  probabilities  for  these  events  are  derived  below. 

P{A2)  =  P{A2 1 A  1)P(A  1)  +  P(A2  \  Al)P(Al) 

=  V4(1-1/2)  +  '/2  +  '/2 
=  3/8 

P(A2AA1)  =P(A2  \A1)P(A1) 

=  (1  -  P(A2  \A1)P(A1)) 

=  1/4 

We  add  these  probabilities  (shown  inside  little  boxes)  to  the  relevant  edges  of  the  scenario  graph  in 
Figure  3.  Since  we  might  assign  probabilities  to  only  some  events  (typically  faults)  and  not  others,  we 
obtain  a  structure  that  has  a  combination  of  purely  nondeterministic  and  probabilistic  transitions.  In  our 
banking  example,  the  architect  might  assign  probabilities  only  to  events  corresponding  to  faults;  the  user 
of  the  banking  system  still  nondeterministically  issues  checks.  Intuitively,  nondeterministic  transitions  are 
actions  of  the  environment  or  the  user,  and  probabilistic  transitions  correspond  to  moves  of  the  adversary. 
If  we  view  nondeterministic  transitions  as  actions,  the  structure  obtained  after  incorporating  probabilities 
into  the  scenario  graph  is  an  MDP.  (In  the  distributed  algorithms  literature  [12],  structures  that  have  a 
combination  of  nondeterministic  and  probabilistic  transitions  are  called  concurrent  probabilistic 
systems.) 

We  now  explain  the  algorithm  to  compute  reliability  and  latency  by  first  considering  a  property 
about  services.  Recall  that  we  are  interested  in  the  following  property: 
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AG(start  — >  AF(finished)) 


Let  G  be  the  service  success  scenario  graph  corresponding  to  this  property.  Suppose  each 
edge  ^  in  G  has  a  cost  c(s  — >  s')  associated  with  it.  Now  the  goal  of  the  environment,  which  is  assumed 
to  be  malicious,  is  to  devise  an  optimal  policy  or  equivalently  choose  nondeterministic  transitions  in  order 
to  minimize  reliability  or  maximize  latency.  A  value  function  V  assigns  a  value  V(s)  for  each  state  s'  in 
the  scenario  graph.  Next  we  describe  an  algorithm  to  compute  the  value  function  V  *  corresponding  to 
this  optimal  policy.  This  algorithm  is  called  policy  iteration  in  the  MDP  literature.  (Later  we  explain 
how  the  value  function  can  be  interpreted  as  worst  case  reliability  or  latency.  In  the  initial  step,  V(s)  =  1 
for  all  the  states  that  satisfy  the  property  finished,  and  for  all  other  states  s  we  assume  that  V(s)  =  0. 
A  state  s  is  called  probabilistic  if  transitions  from  that  state  are  probabilistic.  A  state  is  called 
nondeterministic  ifit  is  not  probabilistic.  For  all  states  ^  that  satisfy  finished  the  value  V(s)  is  always  1; 
and  for  all  other  states  the  value  function  is  updated  as  follows: 

•  If  ^  is  nondeterministic  then 

V(s)^  min  s‘Q^c(c-^s’) +V(s’) 

•  If  ^  is  probabilistic  then 

V (s)  ^  ^p{s,  s)  {cfs  ^  s)  +  Vfs'))  s'Esucc(s) 

In  the  equations  given  above,  succ(s)  is  the  set  of  successors  of  state  s'  and  p(s,  is  the 
probability  of  a  transition  from  state  s  to  s'.  Intuitively  speaking,  a  nondeterministic  move  corresponds 
to  the  environment  choosing  an  action  to  minimize  the  value.  The  value  of  a  probabilistic  state  is  the 
expected  value  ofthe  value  ofits  successors.  Starting  from  the  initial  state,  the  value  function  V  is  updated 
according  to  the  equations  given  above  until  convergence. 

After  the  above  algorithm  converges,  we  end  up  with  the  desired  value  function  V  *.  Let  s,,  be 
the  initial  state  ofthe  scenario  graph. 


•  If  the  cost,  c,  associated  with  the  edges  is  zero,  then  V  *{soJ  is  the  worst  case  reliability  metric 
corresponding  to  the  given  property,  i.e.,  the  worst  case  probability  that  if  a  service  is  issued  it  will 
eventually  finish. 

•  If  the  cost,  c,  associated  with  the  edges  correspond  to  negative  ofthe  latency,  then  the  value  -V  *(so) 
corresponds  to  the  worst  case  latency  of  the  service,  i.e.,  the  worst  case  expected  finishing  time  of  a 
service.  Notice  that  in  this  setting,  minimizing  cost  corresponds  to  maximizing  latency. 


Consider  the  scenario  graph  shown  in  Figure  3.  The  worst  case  reliability  using  our  algorithm  is 
(1/2  X  3/8)  +  (1/2  X  1/4)  =  5/16.  That  is,  the  worst  case  probability  that  a  check  issued  by  Bank-A  on 
Bank-C  is  cleared  is  5/16.  Latency  in  days  for  all  the  events  is  shown  in  Figure  3  inside  square  brackets, 
e.g.,  latency  of  the  event  sendCheck(FRB-3,  MC-3)  is  2  days.  The  worst  case  latency  using  our  algorithm 
computes  to  be  4  days. 

5.6  Step  6:  Cost-Benefit  Analysis 

Finally,  we  add  more  cost  information  and  extend  our  MDP  to  a  CMDP.  Again,  we  will  explain 
this  analysis  using  the  running  example  first.  Suppose  an  architect  wants  to  upgrade  some  links  to 
improve  the  overall  robustness  of  the  system.  Three  links  Link-A-1,  Link-A-2,  and  Link-C-3  are 
candidates  for  being  upgraded.  Assume  that  if  Link-A-1  and  Link-C-3  are  upgraded  then  the  probabilities 
P(A1)  and  P(C3)  increase  to  3/4  respectively.  If  Link-A-2  is  upgraded  then  the  probability  of  Link-A-2 
being  up  is  given  below. 
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P(A2  \A1)^V4 


P(A2  \A1)^3/8 

If  the  links  are  not  upgraded,  then  the  probabilities  do  not  change.  In  addition  to  the  actions 
corresponding  to  the  nondeterministic  transitions,  three  extra  actions  (corresponding  to  upgrading  Link- 
A-I,  Link-A-2,  and  Link-C-3)  are  added  to  the  action  set.  A,  of  the  MDP  that  was  constructed  previously. 
Moreover,  assume  that  the  architect  has  a  cost  constraint  so  that  only  two  links  can  be  upgraded. 
Therefore,  in  this  case  we  obtain  a  CMDP,  where  the  cost  of  upgrading  the  links  is  expressed  by  the 
cost  function  d  (Section  2).  Algorithms  for  finding  optimal  policies  in  the  case  of  CMDPs  exist  but  are 
complicated  [2].  Fortunately,  our  problem  is  easier  because  the  decisions  to  upgrade  the  links  are  static, 
i.e.,  do  not  depend  on  the  state  of  the  system.  In  this  case  the  optimal  decision  can  be  found  by 
solving  an  auxiliary  integer  programming  problem.  With  each  of  the  three  links  Link-A-1,  Link-A-2,  and 
Link-C-3  we  associate  0-1  variables  TQ/,  ^a2  and  Xc3.  Intuitively,  Xai  =  1  indicates  that  Link-A-1  has 
been  upgraded.  Now  the  worst  case  reliability  is  a  function  of  XAI,  XA2,  and  XC3.  We  denote  this  by 
Rel(Xj^i,  Xp^2,  Xqs).  Our  aim  is  to  maximize  the  worst  case  reliability  Rel(XAi,  Xa2,  Xqs)  subject  to  the 
constraint  that  at  most  two  links  can  be  upgraded,  i.e.. 


Xa1+Xa2+Xc3  <  2 

This  is  a  non-linear  integer  programming  problem.  Although  the  problem  in  its  full  generality 
is  hard,  several  heuristics  for  solving  these  class  of  problems  have  been  studied  [16].  For  our  example. 
Figure  4  lists  the  worst-case  reliability  for  the  three  possible  cases.  It  is  clear  that  the  best  option  is  to 
upgrade  Link-A-1  and  Link-C-3. 


Ji-Al  =  1  JUU  =  i 

J'j.1  =  i  =  -1- 

=  1  sjrL  J'CS  =  -1- 

Xi 

Figure  4:  Table  of  Three  Cases 


6  Status 

We  built  a  tool  Trishul  based  on  the  ideas  presented  in  this  paper.  We  implemented  all  the 
basic  algorithms.  We  are  finishing  the  graph  visualization  component  and  a  customized  editor. 

We  also  finished  two  major  case  studies:  an  extended  banking  system  and  a  bond  trading  floor. 
Our  model  of  the  banking  system  is  much  more  complicated  than  the  simplified  example  presented  in  this 
paper.  For  example,  we  handle  protocols  such  as  Fedwire  and  SWIFT  (used  for  transfer  of  funds  and 
transmitting  financial  messages  respectively)  that  we  did  not  show  here.  The  entire  banking  system  model 
is  about  2,000  lines  of  NuSMV  code.  The  scenario  graph  has  about  25,000  nodes  and  computing  reliability 
and  latency  takes  only  a  few  minutes. 

We  also  modeled  and  analyzed  the  system  architecture  of  a  bond  trading  floor  of  a  major 
investment  company  in  New  York.  The  model  is  about  10,000  lines  of  NuSMV  code  and  has  about  100 
state  variables.  Our  tool  found  several  attacks.  Two  of  these  attacks  were  considered  serious  by  the 
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architects.  One  attack  enabled  a  junior  trader  to  acquire  a  head  trader's  password.  The  second  attack 
enabled  a  junior  trader  to  obtain  sensitive  information  from  the  company's  database,  i.e.,  a  junior  trader 
could  find  out  the  nature  of  the  pending  trades.  Not  surprisingly,  we  gained  valuable  experience  during 
this  case  study.  The  most  cumbersome  part  of  the  modeling  process  was  the  fault  injection  phase  because 
the  nature  of  the  faults  injected  was  heavily  dependent  on  the  security  policies  and  technologies  deployed 
at  that  node.  We  plan  to  automate  the  fault  injection  process  in  the  near  future. 


7  Related  Work 

Survivability  is  a  fairly  new  discipline,  and  viewed  by  many  as  distinct  from  the  traditional 
areas  of  security  and  fault-tolerance  [7].  The  Software  Engineering  Institute  uses  a  method  for  analyzing 
the  survivability  of  network  architectures  (called  SNA)  and  conducted  a  case  study  on  a  system  for 
medical  information  management  [8].  The  SNA  methodology  is  informal  and  meant  to  provide  general 
recommendations  of  "best  practices"  to  an  organization  on  how  to  make  their  systems  more  secure  or 
more  reliable.  In  contrast,  our  method  is  formal  and  leverages  off  automatic  verification  techniques 
such  as  model  checking.  Other  papers  on  survivability  can  be  found  in  the  Proceedings  of  the 
Information  Survivability  Workshop  [10] . 

Research  on  security  by  Ortolo,  Deswarte,  andKaaniche  [13]  is  closest  to  Step  4  of 

our  method.  Their  attack  state  graphs  are  similar  to  our  scenario  graphs.  However,  since  we  use 
symbolic  model  checking  to  generate  scenario  graphs,  represented  by  HDDs,  we  can  handle  extremely  large 
graphs.  Moreover,  in  our  method  a  scenario  graph  corresponds  to  a  particular  service;  in  contrast  their 
graph  corresponds  to  a  global  model  of  the  entire  system.  We  are  currently  investigating  how  to 
incorporate  concepts  and  analysis  techniques  presented  in  their  paper  [13].  into  our  method. 

Fault  injection  is  a  well-known  technique  in  the  fault  tolerance  community.  We  allow  the 
designer  to  specify  any  kind  of  fault,  and  thus  we  can  consider  a  wider  class  of  faults.  Moreover,  we 
allow  fault  events  to  be  dependent  and  thus  can  model  correlated  attacks.  Computing  reliability  is  also  not 
new.  There  is  a  vast  amount  of  literature  on  verifying  probabilistic  systems  and  our  algorithm  for 
computing  reliability  draws  on  this  work  [6].  The  novelty  in  our  work  is  the  systematic  combination  of 
different  techniques  into  one  method. 

8  Summary  of  Contributions  and  Future  Work 

Survivability  has  become  increasingly  important  with  society's  increased  dependence  on  critical 
infrastructures  run  by  computers.  In  this  paper,  we  presented  in  a  single  framework  a  systematic  method 
for  analyzing  a  networked  system  for  survivability.  A  fundamental  contribution  of  our  work  is  to  use 
constrained  Markov  Decision  Processes  as  the  sole  underlying  mathematical  model  for  this  framework.  A 
second  contribution  is  the  natural  integration  of  a  set  of  analysis  techniques  from  disparate  communities 
into  this  framework:  model  checking  (popular  in  computer-aided  verification),  Bayesian  network  analysis 
(popular  in  artificial  intelligence),  probabilistic  analysis  (popular  in  hybrid  systems  and  queueing 
systems),  and  cost-benefit  analysis  (popular  in  decision  theory).  In  combination,  these  techniques  let  us 
provide  a  multi-faceted  view  of  the  networked  system.  This  holistic  view  of  a  system  is  at  the  core  of 
achieving  survivability  for  the  system's  critical  services. 

There  are  several  directions  for  future  work.  First,  we  plan  to  finish  the  prototype  tool  that 
supports  our  method.  We  are  working  on  several  case  studies,  including  protocols  used  in  an  electronic 
commerce  system.  Since  for  real  systems,  scenario  graphs  can  be  very  large,  we  plan  to  improve  the  display 
and  query  capabilities  of  our  tool  so  architects  can  more  easily  manipulate  its  output.  Finally,  to  make  the 
fault  injection  process  systematic,  we  are  investigating  how  best  to  integrate  operational  security  analysis 
tools  such  as  COPS  [9]  into  our  method. 
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Survivability 

•  What  if 

-  a  terrorist  hacker  brings  down  the  nation’s 
power  grid? 

-  an  act  of  Mother  Nature  causes  the  US 
banking  network  to  fail? 

•  Critical  infrastructures 

-Utilities:  gas,  electricity,  nuclear,  water,  ... 
-Communications:  telephone,  networks,  ... 

-  Financial:  banking,  trading,  ... 

-Medical:  emergency  services,  hospitals,  ...  3 


Survivability 

A  system  is  survivable  if  it  can  continue  to  provide  end  services 
despite  the  presence  of  fauits. 

Fauits 

-  Accidental  or  malicious 

-  Not  necessarily  independent 

=>  Finer-grained  reliability  analysis  is  enabled/required 
(and  more  relevant). 

Service-oriented 

-  Exploit  semantics  of  application 

=>  Not  all  network  nodes  and  links  are  treated  equally. 
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Foundational  Questions 

•  What  is  the  difference  in  models  for  survivability  and 
those  for 

-  Fault-tolerant  distributed  systems? 

-  Secure  systems? 

•  Our  starting  point: 

-  Independence  assumption  goes  out  the 
window. 

-  Cost  must  be  included  in  the  equation. 


Key  properties 


•  Mission  Focus 

-  Identification  of  risks  and  trade-offs 

-  Alternative  means  to  meet  mission 

•  Assume  imperfect  defenses 
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2  Parts  in  Cooperation 

•  The  Survivable  Network  Architecture  Method 

-  Measures  existing  systems  for  survivability 

-  Focuses  on  user  and  intruder  models 

•  Inverting  Formal  Methods  Techniques  for  Survivability 

-Applies  model  checking  and  other  techniques 

to  survivability 

-Allows  systems  that  are  formally  specified  to 
submit  to  survivability  analysis 


The  Survivable  Network  Analysis  Method 

•  Focus 

-  early  phase  of  life  cycle 

-  applications  as  well  as  system  infrastructure 

-  tailorable  depending  on  stage  of  development. 

•  Three  options  for  SNA  analysis 

-  survivability  architecture 

-  survivability  requirements 

-  mission  lifecycle 
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Architectural  Focus 

•  Capture  assumptions  such  as  boundaries  and  users 

•  Support  system  evolution  as  requirements  and 
technologies  change 

-  evolving  functional  requirements 

-  trend  to  loosely  coupled 

-  requirements  for  integration  across  diverse  systems 

•  Assist  with  product  selection  and  integration  with 
respect  to  rapidly  changing  security  product  world 


General  Method 

•  Identify  essential  services  with  normal  usage. 

•  Generate  intrusion  scenarios  which  are  use  cases  for 
intruder 

•  Evaluate  system  in  terms  of  response  to  scenarios 

-  Requirements:  propose  response  to  intrusions 

-  Architecture:  evaluate  system  and  operational 
behavior 

•  Mission  impact 

-  applications  as  well  as  system  components 

-  stakeholders  input  essential 

10 
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Survivability  Architecture 

•  Make  recommendations  for  survivability 
improvements 

•  Identify  decision  and  tradeoff  points  -  areas  of  high 
risk 

•  Identify  trade-offs  with  other  software  quality  attributes 
-  safety,  reliability,  performance,  usability 


The  Survivable  Network  Analysis  Method 
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Determining  Survivabiiity  Strategies 


Survivabiiity  Map 


Intrusion 

Scenario 

Softspot 

Effects 

Architecture 
Strategies  for 

Resistance 

Recognition 

Recovery 

(Scenario 

1) 

Current 

Recommended 

(Scenario 

n) 

Current 

Recommended 

•  Roadmap  for  management  evaluation  and  action 
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Option:  Survivability  Requirements 

•  Identify  requirements  for  mission-critical  functionality 

-  minimum  essential  services 

-  graceful  degradation  of  services 

-  restoration  of  full  services 

•  Identify  explicit  requirements  for 

-  recovery 

-  recognition 

-  resistance 

15 


Option:  Mission  Lifecycle 

•  Factor  survivability  into  the  development  and  operational 
lifecycle 

•  Capture  security  and  survivability  assumptions 

-  boundaries,  users 

•  Identify  survivability  decision  points 

-  impact  of  changes  on  recovery,  intrusion  detection,  etc. 


16 


69 


Benefits  of  the  SNA 

•  Clarified  requirements 

•  Documented  basis  for  system  decisions 

•  Basis  to  evaluate  changes  in  architecture 

•  Early  problem  identification 

•  Increased  stakeholder  communication 


Additional  Information 

•  SNA  Case  Study:  The  Vigilant  Healthcare  System 

-  IEEE  Software:  July/August  1999 

•  Survivability:  Protection  Your  Critical  Systems 

-  IEEE  Internet  Computing:  Nov/December  1999 

•  Web  site:  IEEE  article  and  other  reports 
www.sei.cmu.edu/organization/programs/nss/surv-net-tech.html 
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Scenario  Set 


19 


Phase  1 


Network  Model  =  Survivability  Property  = 

A  set  of  concurrently  executing  A  predicate  in  CTL, 

Finite  State  Machines. 


A  set  of  reiated  exampies. 
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Network  Model 


Processes 

-  Nodes  and  links  are  processes  (i.e.,  FSMs) 

•  banks,  money  centers,  federal  reserve  banks,  and  links 

-  Communication  via  shared  variables  (i.e.,  finite  queues) 

•  representing  channels,  and  hence  interconnections. 

Failures 

-  Faults  represented  by  special  state  variable 

•  fault:{normal,  failed,  intruded} 

-  Links  and  banks  can  fail  at  any  time 

•  Failed  link  blocks  all  traffic. 

•  Failed  bank  routes  all  checks  to  an  arbitrarily  chosen  money  center. 

-  Money  centers  and  federal  reserve  banks  do  not  fail. 


21 


Survivability  Properties 

•  Fault-related 

-  Money  never  deposited  into  wrong  account. 

•AG  (—Terror) 

•  Service-related 

-A  check  issued  eventually  clears. 

• AG (checkissued  — > 

AF (checkCleared) ) 
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Inputs  to  Model  Checker 


•  state  machines 

MODULE  main 

fault:  {normal,  fall-stop,  Byzantine,  hacker-attack,  terrorist-attack,  link-down,  ...} 

next  (fault)  :=  case 

fault  =  normal  :  {normal,  fail-stop,  ...} 

Pi('^n)  •  {hacker-attack,  terrorist-attack} 

default :  fault 

esac 

MODULE  bank(user,  <other  input  parameters>) 
next  (...)  :=  case 

Pj('^m)  &  tsult  =  normal  =>  <route  check  to  user.destination> 

•  Property 

AG  not(faulty)  ^ 
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A  Service  Success  Scenario  Graph 


issueChecl<(A,  C) 

up(a2) clown(a2)  &  up(al) 


send(A,  MC-2) 

I 

send(MC-2,  FRB-1) 
send(FRB-l,  FRB-3) 


send(A,  MC-1) 

I 

send(MC-l,  FRB-2) 

1 

send(FRB-2,  FRB-3) 


^  send(FRB-3,  MC-3) 
up(cl)  I 

send(MC-3,  C) 

1 

debitAccount 
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Overview  of  Method 

Network  Model 

Survivability  Property 

Phase  1  Checker 

Scenario  Graph 

Annotations  — 

(e.g.,  probabilities, 

,  Reliability  Query, 

\  /  Cost  Query,  etc. 

Phase  2 

Analyzer 

Scenario  Set  27 

Phase  2:  Reliability  Analysis 


•  Annotations  =  Probabilities 

-  Use  Bayesian  Networks  to  model  dependence  of 
events. 

•  Symbolic 

-  Use  symbolic  probabilities 

•  high,  medium,  low 

-  Use  NDFA  theory  to  compute  scenario  set. 

•  Continuous 

-  Use  numeric  probabilities 

•  [0.0,  1.0] 

-  Use  Markov  Decision  Processes  to  model  both 
nondeterministic  and  probabilistic  transitions. 
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State  of  Project 

•  Tools 

-  Trishul  tool 

•  Uses  NuSMV  model  checker,  done  by  Somesh  Jha 

-  New  tool 

•  Uses  SPIN,  ongoing  by  Oleg  Sheyner 

•  Case  studies  (Jha,  Sheyner) 

-  Trading  floor  model  of  major  investment  bank  (being 
“sanitized”  by  Jha) 

•  10K  lines  of  NuSMV 

•  half-million  nodes  in  scenario  graph 

•  50  threat  scenarios 

•  45  found  by  system 

•  5  new  threat  scenarios  found 

•  With  independence  assumption,  too  many  misses. 

-  B2B  e-commerce  NYC  start-up  (Jha)  2' 

•  50K  lines  of  Statecharts,  2  million  NuSMV  beyond  capability  of  tool 


Sample  Open  Research  Questions 


•  Foundational 

-  What  is  an  appropriate  fault-model  for  survivable  systems? 

•  Malicious  attack  versus  Byzantine  failure 

-  What  role  does  “service-oriented”  really  play  in  the  notion  of 
survivability? 

-  What  is  an  appropriate  logic  for  describing  survivability 
properties? 

•  What  logic  or  subset  of  CTL  corresponds  to  finite  scenario  graphs? 

•  Pragmatic 

-  How  applicable  is  the  CMDP  model  for  other  critical 
infrastructure  examples? 

•  How  far  can  we  push  the  analysis  techniques? 

-  What  combination  of  tools  can  further  automate  the  analysis? 

•  Linear  programming  packages,  theorem  provers,  ... 

-  How  can  you  design  a  system  for  survivability? 
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